Hi, Anderson,

I am not sure that I have a good answer for you, but here are some 
guesses.

A possibility is that the number of distinct values is much larger in 
this CSV file.  For example, one of the categorical values' column 
might have a lot more different categories, or the double column might 
have values that never repeat.  This will cause the FastBit software 
to waste more memory -- for one reason or another, a lot of different 
values mean FastBit needs to create a lot of small bitvector objects 
and the total size of these small objects should be bounded but they 
can cause more memory to be occupied.

However, this is only a speculation.  I can not tell much from the 
-part.txt file itself.  I am willing to take a look at the actual CSV 
file if you are able to share the file with me.

John


On 5/28/11 4:50 PM, Anderson C. Carniel wrote:
> Hi John,
>
> Thanks for the quick response. Well I tested it with two CSV files,
> and not too successfully constructed at the end of the email follows
> the file -part.txt this partition, which is interrupted after the
> process of creating the index.
> So I tested it with a CSV file only, and the index was successfully
> constructed.
>
> What I found odd is that this occurs only with these data. In other
> CSV files containing the same structure of columns and identical data
> was possible to build a data partition containing up to 17 million
> lines without problems.
> Thanks for help.Best regards
> File -part.txt:
> # metadata file written by ibis::part::writeMetaData# on Sat May 28
> 23:34:00 2011 UTC
> BEGIN HEADERName = "teste4"Description =
> "/opt/fastbit-ibis1.2.3/examples/.libs/lt-ardea -d /teste4 -m
> col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int
> -t /teste4/csv0.csv -t /teste4/csv1.csv"Number_of_columns =
> 8Number_of_rows = 12876900Timestamp = 1306625640State = 1index =
> <bining none/> <encoding equality/>END HEADER
> Begin Columnname = "col0"data_type = "DOUBLE"End Column
> Begin Columnname = "col1"data_type = "DOUBLE"End Column
> Begin Columnname = "col2"data_type = "INT"End Column
> Begin Columnname = "col3"description = col3data_type =
> "CATEGORY"minimum = 0maximum = 9223372036854775808End Column
> Begin Columnname = "col4"description = col4data_type =
> "CATEGORY"minimum = 0maximum = 9223372036854775808End Column
> Begin Columnname = "col5"description = col5data_type =
> "CATEGORY"minimum = 0maximum = 9223372036854775808End Column
> Begin Columnname = "col6"description = col6data_type =
> "CATEGORY"minimum = 0maximum = 9223372036854775808End Column
> Begin Columnname = "col7"description = col7data_type =
> "CATEGORY"minimum = 0maximum = 9223372036854775808End Column
>
>
>
>  > Date: Thu, 26 May 2011 11:15:02 -0700
>  > From: [email protected]
>  > To: [email protected]
>  > CC: [email protected]
>  > Subject: Re: [FastBit-users] Problema with ibis in size partition
>  >
>  > Hi, Anderson,
>  >
>  > The core limitation of FastBit is that when building indexes at least
>  > one column and its corresponding index must fit into memory. Since
>  > you have about 44 million rows, to hold a double-precision column in
>  > memory table abut 350 MB. The size of the corresponding index is like
>  > about the same size -- however, because the memory is allocated in
>  > relatively small chunks (especially if there are many distinct values
>  > in the data), there is likely a lot of waste. The more distinct
>  > values there are, the more waste there will be. For double precision
>  > values, especially those computed from simulations, there are many
>  > different distinct values.
>  >
>  > With that explanation, here are two suggestions for dealing with the
>  > problem. One suggestion is to break the data into smaller partitions.
>  > For example convert each CSV file into a data partition.
>  >
>  > Since the total volume is relatively small, another possibility is to
>  > tell FastBit to use more memory. By default, FastBit will use half of
>  > the physical memory. You can tell it to use more memory by using a
>  > configuration parameter called fileManager.maxBtyes. The easiest way
>  > to get ibis to read this parameter is to put the following line in a
>  > file named ibis.rc in the current working directory.
>  >
>  > fileManager.maxBytes = 1.5GB
>  >
>  > Hope these help.
>  >
>  > John
>  >
>  >
>  > On 5/26/11 8:51 AM, Anderson C. Carniel wrote:
>  > > Hi John!
>  > >
>  > > I'm using fastbit 1.2.3. I have 5 CSV files, each csv file has
>  > > 6,438,450 rows and about 460 MB. These data are organized into eight
>  > > columns on which I build the data partition without problems, as
> follows:
>  > >
>  > > /opt/fastbit-ibis1.2.3/examples/ardea -d /test/agg/index0 -m
>  > >
> "col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int"
>  > > -t /test/agg/csv0.csv -t /test/agg/csv1.csv -t /test/agg/csv2.csv
>  > > /opt/fastbit-ibis1.2.3/examples/ardea -d /test/agg/index1 -m
>  > >
> "col5:key,col4:key,col7:key,col6:key,col1:double,col0:double,col3:key,col2:int"
>  > > -t /test/agg/csv3.csv -t /test/agg/csv4.csv
>  > >
>  > > But when I build the index:
>  > >
>  > > /opt/fastbit-ibis1.2.3/examples/ibis-d / test/agg/index0-b "<bining
>  > > none/> <encoding equality/>"
>  > >
>  > > The ibis consumes all available memory, and do much swap and not
>  > > complete the construction, this operation has been running for about
>  > > 15 hours.
>  > >
>  > > My machine has 2 GB of RAM, where the accounts should support up to
>  > > 44,564,480 lines to build the index. But even using only about 19
>  > > million lines for the first partition, the ibis was unable to build
>  > > the index.
>  > >
>  > > What could be the problem?
>  > >
>  > > Thanks for the help.
>  > > Ouvir
>  > > Ler foneticamente
>  > >
>  > > Best regards
>  > >
>  > > []s
>  > >
>  > >
>  > >
>  > > _______________________________________________
>  > > FastBit-users mailing list
>  > > [email protected]
>  > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to