Le 11/11/14 23:54, Kiran Ayyagari a écrit : > On Wed, Nov 12, 2014 at 6:40 AM, Emmanuel Lécharny <[email protected]> > wrote: > >> Some update... >> >> The multi-values support is not that easy. Thanks to Shawn, who pointed >> me to the TreeMap data structure, I was able to fix the merge sort, >> which was a requirement to be able to process the tuples and their >> values across multiple files. >> >> But that was just one of my problems... >> >> Now, I realise that there is a big flaw in teh way the BTree is loaded : >> when we have multiple values, we can't anymore know teh number of >> elements we have to store, so all the pages number computation is >> broken. The consequence is that we don't know when to stop building the >> tree, or if we stop, we may have incomplete pages. >> >> The solutions : >> - don't care about the real number of elements, fix the bundaries (ie >> incomplte pages at teh end of each level) when the last element is fetch. >> - as we have a list of files to read elements from, use that to create >> one single file that contains all the elemnts, sorted. We will be able >> to know the number of elements to prcoess, thus building the right BTree. >> >> I do think the second solution is better, even if it requires to write >> another file. All in all, if we have N elements, processing the whole >> bulk load is now a matter of reading one big file, writing into N >> smaller files, read all of those smaller files and write into a bigger >> file, then read all teh elemenst from this bigger file to write teh data >> in the btree. It's simpler that the other algorihtm, as we don't have to >> cleanup the btree (such a cleanup would require that for each created >> level, we check the last page to see if it's uncomplete (ie < >> PageSize/2) and if so, merge it with the previous page and spread the >> result so that no page has less that PageSize/2 elements). >> >> +1, cause there seem to be too many boundaries, and IMO handling these > necessarily > not optimal effort in this scenario > >> This second algorithm will most of the time be faster, as we read and >> write the big file one less time, but OTOH, it will move some pages >> around in the resulting BTree, leaving some holes we will have to track. >> That might be an optimization for later on...
There is a better way : we don't necessarily have to flush the pages imediately. That would only require to keep two pages per level instead of one. With 2 pages, you always know what to do and how to spread the elements in the last past and the previous page, before writing them on disk. That is a slight modification on the existing algorithm, and it saves a hell lot of processing. Nothing can't be solved with a good night of sleep ;-)
