0602
oogh i'm so weary. looking forward to doing something pleasant today,
like eating breakfast or napping or writing some code that's _easy_ to
write, not that makes my eyes spasm around and stuff.
anyway i'll copy some chunks back and forth.
0603
self[idx:] = (
#(leaf_count_of_partial_index_at_end_tmp,
running_size, spliced_out_start - running_size, last_publish),
(new_leaf_count, running_size, new_size, last_publish),
(-1, 0, spliced_in_size, spliced_in_data)
)
note: this code didn't show the behavior of empty branches
0603
class append_tree:
def __init__(self, degree = 2):
self.obj = mix_indices.append_tree(degree)
0603
huh i remember i was mid writing that
0605 i'm thinking of how the current "type" field has a space it
overlaps with the negative leaf count.
<= 0: leaf, whether leaf count or type
> 0: inner node, whether leaf count or type
:)
0605
0607
0617
this is from an iterate() function. i think it's in an append tree
implementation. not certain which file i have open:
subindex = stored_indices[subid]
#adjusted_start = start_offset - substartoffset + suboffset
#adjusted_end = end_offset - substartoffset + suboffset
#data = b''.join(iterate(subindex, adjusted_start,
min(adjusted_end, subsize - adjusted_start)))
data = b''.join(iterate(subindex, start_offset, end_offset))
data = list(iterate(subindex, start_offset, end_offset))
assert len(data) == subleafcount
yield from data
i think actually it's from test.py in flat_tree
0618
0619
0652
0653
pip3 install git+https://github.com/xloem/flat_tree
i'll see if i can change the capturer to use this. consolidates the
approach a little.
0654
0711
an entry or two was lost when i tried to send to preserve. gmail send
there was a temporary issue with my account.
0712
what i was saying was that i noticed i got some delayed messages from
the mailserver, and i'm worried it could go down :S because there are
so many unsubscription messages and the functioning seems delayed. i
don't know why it would be hard to process unsubscription messages,
and i'm worried space could be exhausted.
reminding myself that greg has handled this stuff a lot in the past,
and the server will likely be fine.
0713
0736
i actually plugged the flat_tree code into the capture code and it
works fine with test data consisting of zeros. time to do a recording,
and compare with prior to upload, etc.
0737
0850
ok i did a test recording. the metadata i'm using to store and
reference stuff has changed. this is confusing for me. also, a goal
part of me has started getting funny, could be acting a little
psychosisy, unsure.
notes:
- an index is an array of references to data (nodes with children),
and describes a seekable stream region with a finite length
- i was using something that wasn't an array, to reference the data,
apparently, before
here's the file data from some old test:
{"dataitem": "ugKZULvxJ0jdQB5k4s9v9ntK3o4Edc16_2DXpYurEMM",
"current_block":
"gmYYdc12P-TGgKF8gRld3UfXfxNEHusmN-q656PfZR3198CjYRWk58g8WLNc7Nxm",
"end_offset": 400000}
here's the file data i'm using now:
{"ditem": ["O2eYAIajAci876PXp9p2FG1zDeg2VoWSLRxPwiFOtB8"],
"min_block": [995989,
"XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"],
"api_block": 996390}
here's an onchain index from yesterday:
[[1, {"ditem": ["nIOsVBh6IM0EWq10P882TH9OhFw272ByHFJZD8G6rrA"],
"min_block": [995159,
"-p20J8zfYeZn8jFYiV-X4I62ubge3RW-2pthuB_hN5LrKqA2L4tvX55fgwSoAatG"],
"api_block": 995559}, 0, 900000], [0, {"capture": {"ditem":
["rWTfslX9PzbtNeTjlrHmCHQXuW16nZg7iQ7WAY3Y-ZM"]}, "min_block":
[995159, "-p20J8zfYeZn8jFYiV-X4I62ubge3RW-2pthuB_hN5LrKqA2L4tvX55fgwSoAatG"],
"api_block": 995559}, 0, 100000]]
here's the one i'm testing with now:
[[9, {"ditem": ["G89Z1mwtX1RA4hw7EE6PY0uT5JIqLkORAPj5aHybgjc"],
"min_block": [995989,
"XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"],
"api_block": 996390}, 0, 900000], [1, {"ditem":
["evXLM3y9vl-6qvReT0VyyMJINtmU_9ceV0n6cJzqmE8"], "min_block": [995989,
"XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"],
"api_block": 996390}, 900000, 100000], [1, {"ditem":
["ufrSxEIdUr-byh3_GJdN9iAtcC7kwiRGnkA7IBUz_20"], "min_block": [995989,
"XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"],
"api_block": 996390}, 1000000, 100000], [-1, {"capture": {"ditem":
["zsmLsjKJj-2Vq5eLNhIo0BF6_PswS4hWCTpxqeDQ2fs"]}, "min_block":
[995989, "XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"],
"api_block": 996390}, 0, 100000]]
both are lists.
the code is throwing an error that something is a list when it thinks
it should be a dict. maybe it's inside the structure, unsure. 0855 .
0858
looks like i might have just been providing the wrong data by accident ...
... 0859 nope, it just takes a bit to cache big blocks
ok i'm going to note the parts of the exception context
TypeError: list indices must be integers or slices, not str
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/ubuntu/src/log/download.py(144)dataitem()
-> for height in range(preceding_height + 1, self.tail['api_block'] +
1) if hasattr(self, 'tail') else itertools.count(preceding_height +
1):
(Pdb) p self.tail
[[9, {'ditem': ['G89Z1mwtX1RA4hw7EE6PY0uT5JIqLkORAPj5aHybgjc'],
'min_block': [995989,
'XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-'],
'api_block': 996390}, 0, 900000], [1, {'ditem':
['evXLM3y9vl-6qvReT0VyyMJINtmU_9ceV0n6cJzqmE8'], 'min_block': [995989,
'XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-'],
'api_block': 996390}, 900000, 100000], [1, {'ditem':
['ufrSxEIdUr-byh3_GJdN9iAtcC7kwiRGnkA7IBUz_20'], 'min_block': [995989,
'XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-'],
'api_block': 996390}, 1000000, 100000], [-1, {'capture': {'ditem':
['zsmLsjKJj-2Vq5eLNhIo0BF6_PswS4hWCTpxqeDQ2fs']}, 'min_block':
[995989, 'XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-'],
'api_block': 996390}, 0, 100000]]
line 144
0905
ok i vaguely remember when the onchain metadata was a dict.
the thing i looked at from yesterday was a list, not a dict. must have
been a new format i was considering, maybe.
0916
it got it downloading data. the data differs from the upload.
0918
i diff'd the hex of the files and there is just some data missing from
the end, so it worked. great news.
0918
the total difference in size is 166016 bytes, when means a chunk was
dropped. but this is just the single-threaded test, so it's not too
important maybe.
0919 quick review of the on-chain format to see if it's reasonable
0920
$ curl -L https://arweave.net/O2eYAIajAci876PXp9p2FG1zDeg2VoWSLRxPwiFOtB8
| python3 -m json.tool
[ # root node
[ # first inner node
9, # leaf count, -1 means leaf
{ # data locator, anything could go here
"ditem": [ # sequence of data
"G89Z1mwtX1RA4hw7EE6PY0uT5JIqLkORAPj5aHybgjc"
],
"min_block": [ # block when data was sent out
995989,
"XA94QP2GfxJ5iOfKAzLsqscKoKQue1SvOiDGg8mzVqheEYa1NcofIcn7QX-P6FW-"
],
"api_block": 996390 # block the sending service said it
would be mined by
},
0, # offset into data this node references
900000 # number of bytes after offset that are referenced
],
# i'm using a free sending service that bundlr.network provides, paid
for by arweave for all to use
seems reasonable for now. the addition of the library dependency makes
me more okay with it. things are more changeable when they're
organised.
0923
0924
so what i'd like to do here is now port the multithreaded uploader
this gets me to the same place i was before with a consistent data format
0933
having some inhibition
the situation with the cpunks list has that
multiple-confusions-overlapping property that makes people more
manipulable by others, makes it harder to store reliable memories and
form reliable inferences, etc etc.
i'm gonna take one of those nice cold showers and do something else for a bit.