Hi, Anderson, I am not sure that I have a good answer for you, but here are some guesses.
A possibility is that the number of distinct values is much larger in this CSV file. For example, one of the categorical values' column might have a lot more different categories, or the double column might have values that never repeat. This will cause the FastBit software to waste more memory -- for one reason or another, a lot of different values mean FastBit needs to create a lot of small bitvector objects and the total size of these small objects should be bounded but they can cause more memory to be occupied. However, this is only a speculation. I can not tell much from the -part.txt file itself. I am willing to take a look at the actual CSV file if you are able to share the file with me. John On 5/28/11 4:50 PM, Anderson C. Carniel wrote: > Hi John, > > Thanks for the quick response. Well I tested it with two CSV files, > and not too successfully constructed at the end of the email follows > the file -part.txt this partition, which is interrupted after the > process of creating the index. > So I tested it with a CSV file only, and the index was successfully > constructed. > > What I found odd is that this occurs only with these data. In other > CSV files containing the same structure of columns and identical data > was possible to build a data partition containing up to 17 million > lines without problems. > Thanks for help.Best regards > File -part.txt: > # metadata file written by ibis::part::writeMetaData# on Sat May 28 > 23:34:00 2011 UTC > BEGIN HEADERName = "teste4"Description = > "/opt/fastbit-ibis1.2.3/examples/.libs/lt-ardea -d /teste4 -m > col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int > -t /teste4/csv0.csv -t /teste4/csv1.csv"Number_of_columns = > 8Number_of_rows = 12876900Timestamp = 1306625640State = 1index = > <bining none/> <encoding equality/>END HEADER > Begin Columnname = "col0"data_type = "DOUBLE"End Column > Begin Columnname = "col1"data_type = "DOUBLE"End Column > Begin Columnname = "col2"data_type = "INT"End Column > Begin Columnname = "col3"description = col3data_type = > "CATEGORY"minimum = 0maximum = 9223372036854775808End Column > Begin Columnname = "col4"description = col4data_type = > "CATEGORY"minimum = 0maximum = 9223372036854775808End Column > Begin Columnname = "col5"description = col5data_type = > "CATEGORY"minimum = 0maximum = 9223372036854775808End Column > Begin Columnname = "col6"description = col6data_type = > "CATEGORY"minimum = 0maximum = 9223372036854775808End Column > Begin Columnname = "col7"description = col7data_type = > "CATEGORY"minimum = 0maximum = 9223372036854775808End Column > > > > > Date: Thu, 26 May 2011 11:15:02 -0700 > > From: [email protected] > > To: [email protected] > > CC: [email protected] > > Subject: Re: [FastBit-users] Problema with ibis in size partition > > > > Hi, Anderson, > > > > The core limitation of FastBit is that when building indexes at least > > one column and its corresponding index must fit into memory. Since > > you have about 44 million rows, to hold a double-precision column in > > memory table abut 350 MB. The size of the corresponding index is like > > about the same size -- however, because the memory is allocated in > > relatively small chunks (especially if there are many distinct values > > in the data), there is likely a lot of waste. The more distinct > > values there are, the more waste there will be. For double precision > > values, especially those computed from simulations, there are many > > different distinct values. > > > > With that explanation, here are two suggestions for dealing with the > > problem. One suggestion is to break the data into smaller partitions. > > For example convert each CSV file into a data partition. > > > > Since the total volume is relatively small, another possibility is to > > tell FastBit to use more memory. By default, FastBit will use half of > > the physical memory. You can tell it to use more memory by using a > > configuration parameter called fileManager.maxBtyes. The easiest way > > to get ibis to read this parameter is to put the following line in a > > file named ibis.rc in the current working directory. > > > > fileManager.maxBytes = 1.5GB > > > > Hope these help. > > > > John > > > > > > On 5/26/11 8:51 AM, Anderson C. Carniel wrote: > > > Hi John! > > > > > > I'm using fastbit 1.2.3. I have 5 CSV files, each csv file has > > > 6,438,450 rows and about 460 MB. These data are organized into eight > > > columns on which I build the data partition without problems, as > follows: > > > > > > /opt/fastbit-ibis1.2.3/examples/ardea -d /test/agg/index0 -m > > > > "col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int" > > > -t /test/agg/csv0.csv -t /test/agg/csv1.csv -t /test/agg/csv2.csv > > > /opt/fastbit-ibis1.2.3/examples/ardea -d /test/agg/index1 -m > > > > "col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int" > > > -t /test/agg/csv3.csv -t /test/agg/csv4.csv > > > > > > But when I build the index: > > > > > > /opt/fastbit-ibis1.2.3/examples/ibis-d / test/agg/index0-b "<bining > > > none/> <encoding equality/>" > > > > > > The ibis consumes all available memory, and do much swap and not > > > complete the construction, this operation has been running for about > > > 15 hours. > > > > > > My machine has 2 GB of RAM, where the accounts should support up to > > > 44,564,480 lines to build the index. But even using only about 19 > > > million lines for the first partition, the ibis was unable to build > > > the index. > > > > > > What could be the problem? > > > > > > Thanks for the help. > > > Ouvir > > > Ler foneticamente > > > > > > Best regards > > > > > > []s > > > > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
