Re: [basex-talk] UPDINDEX and ever growing index size

Christian Grün Wed, 30 Jul 2014 06:45:07 -0700

Hi James,

I had some first thoughts on possible optimizations for the increasing
file size problem, and I may have found a fairly easy solution that
covers some of the current problems. It's not implemented yet, but I
could at least fix the initial 4096 byte problem [1].


I'll keep you updated,
Christian

[1] https://github.com/BaseXdb/basex/issues/970



On Sat, Jul 19, 2014 at 12:06 PM, Christian Grün
<christian.gr...@gmail.com> wrote:
> Hi James,
>
>> However the behaviour is different when using db:replace. I think it's doing 
>> a db:delete() and then a db:add(). So first the index file has the ID list 
>> for that attribute value rewritten in place (so the count will go from 2048 
>> to 2047 for example) with a new value for count and just the remaining IDs 
>> once the document being replaced is removed. The now unused bytes at the end 
>> are left with their previous values. Then a completely new ID list is 
>> written to the end of the file (now with the count back up to 2048 for 
>> example) as the replacement attribute is added.
>
> That's a good hint, and (as you already guessed) it's due to the
> current semantics of our replace operation [1]. As a replaced document
> may contain a completely different structure and contents, it would
> probably be tricky to replace ID lists on a lower level (instead of
> deleting and adding them). One plan to solve the issues could be a
> data structure that remembers free slots in the heap file, which can
> later be filled up with new entries.
>
>
>> [As a note: there seems to be a small bug when UPDINDEX is true in that a 
>> index file is always at least 4096 bytes. When an empty database is created 
>> the index file will be 4096 zero bytes with updates appended to the end. 
>> Even if you optimize the file will be padded to 4096 bytes with zeros.]
>
> Thanks, I will remember that. Maybe the minimum of 4096 bytes will
> stay, but it should definitely be overwritten from the very beginning
> when new data is inserted.
>
>
>> I'd love to be able to do everything with UPDINDEX set to true and just 
>> forget about it.
>
> Me too ;) Let's see when it can be done.
>
>
>> How fixed is the index file format? I ask because I've spent some time 
>> understanding how it works so I can read the files and see exactly what's in 
>> them. If it would be useful then I'm happy to put the information into the 
>> wiki somewhere to make it quicker for anyone else who's interested. However 
>> if you want to keep the structure obscure for any reason then I won't 
>> publish anything. Let me know.
>
> Thanks, contributions like that are always appreciated! The storage
> structure is supposed to be open to everyone. I guess you have already
> stumbled upon [3] and [4]; all edits are welcome, and may motivate
> others to think about better solutions.
>
> Christian
>
> [1] 
> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/FNDb.java#L577-L608
> [2] https://github.com/BaseXdb/basex/issues/970
> [3] http://docs.basex.org/wiki/Storage_Layout
> [4] http://docs.basex.org/wiki/Node_Storage

Re: [basex-talk] UPDINDEX and ever growing index size

Reply via email to