Hi, Nan, Thanks for your interest in FastBit. The way you are partitioning the data is the right way. The problem is in the file -part.txt. Here are a couple of observations:
- name can only contain alphanumeric values (plus underscore), no punctuations, not special symbols, the data partition name in each directory must be different from each other. The simplest thing to do is to not specify a name, in which case, the name of the directory will be used as the data partition name. - the number of rows in -part.txt must reflect the number of values in the data file. Since you have divided you DOUBLE values into 2GB files, each file contains 2GB/8B values. - when you want FastBit to work with all the directories together, give the name of the directory containing all the data directories as the argument to -d of ibis command line. Hope this helps. John On 4/10/14, 8:59 PM, nan zhou wrote: > hey, all, > > I currently have a 64GB binary data file ( one variable/one column > data), and I want to build indexes for this file. > The problem I had is how to do partitioning, so that fastbit can > handle the index building. > On the fastbit website, it has the reason why fastbit can not handle > one large partition data, but it seems it does not have an example to > show how to do the partitioning. Or probably it is very > straightforward way which does not need the example. But, I just dont > know how to deal with it. I did a way that I feel right, but it turns > out fastbit does not like it. What I did was manually chopping the > file into multiple 2GB binary data files, and created separate folder > for each data. Then, I generated -part.txt file under every folder. > The -part.txt file has same content (same variable, row number etc ). > The reason I did this that, probably, fastbit will treat this as one > single column/variable, but with multiple partitions. However, it > turns out fastibt treated those files are same, so, it only kept the > last partition. My question is how can I do it correctly? Another way > I was imaging is only keeping one -part.txt for all the partitions, > and specifying the partition size in the file. Something like: > > BEGIN HEADER > Name = tv-table$ > Description = "Created on Thu Apr 10 23:23:29 EDT 2014 with 4294967296 > rows and 1 columns." > Number_of_columns = 1 > Number_of_rows = 4294967296 > END HEADER > > BEGIN COLUMN > name="tv" > data_type="DOUBLE" > partition_size=134217728 rows > END COLUMN > > But I guess the -part.txt does not support this. Anyway, probably > fastbit has the approach to solve it, but I did not find it. > Would be possible you can point me to there? > > Thanks > > nan > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
