When the data is stored in the column's vector, a native data type is used for 
storage.  A 64 bit int with 19 digits takes 19 bytes in a text file but only 8 
bytes (64 bits) on disk.  This likely accounts for any storage size differences.

--justin

Sent from my iPhone

> On Jul 15, 2015, at 9:19 AM, Mazen Kachmar <[email protected]> wrote:
> 
> Actually, I think I was a bit hasty. I don't think FastBit is doing any 
> compression on the original data. I was looking at the wrong files when I 
> made my measurements. 
> 
> Please correct me if i am wrong, but I think FastBit does not do any 
> compression on the original data. It would actually be interesting to do this 
> as it might make it faster to run queries. Anyone attempted this before?
> 
> Thanks,
> Mazen
> 
> From: [email protected]
> To: [email protected]
> Subject: RE: [FastBit-users] Does FastBit need to use filesystem?
> Date: Wed, 15 Jul 2015 15:29:03 +0000
> 
> Hi John,
> I have a follow-up clarification question on this thread please.
> 
> I noticed that the binary data files that FastBit produces is much smaller in 
> size than the original data. My CSV file is 42.8 MB, however the total size 
> of the binary files representing all the columns in the CSV file that FastBit 
> produced is about 24 MB. 
> 
> When FastBit transforms the data into its own format, does it compress the 
> data as well? If so, what compression scheme is it using? 
> 
> Thank you,
> Mazen
> 
> > From: [email protected]
> > To: [email protected]
> > Date: Tue, 7 Jul 2015 11:36:36 -0700
> > Subject: Re: [FastBit-users] Does FastBit need to use filesystem?
> > 
> > Hi, Mazen,
> > 
> > Thanks for your interest in FastBit software, and thanks Justin and
> > Teryl for chiming in. The following is a bit more information.
> > 
> > If you know how to feed the necessary data to FastBit, it is possible
> > to keep your data in your own format. The FastQuery work is one
> > attempt at this <http://www-vis.lbl.gov/Events/SC05/HDF5FastQuery/>.
> > There was also an attempt at doing indexing and querying "in situ"
> > <http://crd-legacy.lbl.gov/~kewu/ps/LBNL-5280E.html>, i.e., keeping
> > everything in memory.
> > 
> > Feel free to let us know if you need any additional information.
> > 
> > John
> > 
> > 
> > On 7/6/15 3:47 AM, Mazen Kachmar wrote:
> > > Thanks Justin and Teryl. 
> > > 
> > > Justin, I think you are right. After reading this
> > > paper http://crd-legacy.lbl.gov/~kewu/ps/LBNL-59952.pdf, things are a
> > > bit more clear to me. FastBit needs to store the data in its own
> > > binary form to do vertical partitioning and support the binning option.
> > > 
> > > Teryl, thanks for the suggestion. Yes, it should be possible to create
> > > a conceptual RAM disk. I was just wondering if FastBit actually has
> > > that option. It does not look like it. I might actually implement this.
> > > 
> > > John, once you have a moment, it would be great to get your
> > > confirmation/blessing of this small discussion.
> > > 
> > > Thanks!
> > > Mazen 
> > > 
> > > ----------------------------------------------------------------------
> > > Date: Sat, 4 Jul 2015 14:03:50 -0400
> > > From: [email protected]
> > > To: [email protected]
> > > Subject: Re: [FastBit-users] Does FastBit need to use filesystem?
> > > 
> > > Have you tried making a RAM disk to store the results? That way you
> > > can keep everything in memory and still support fastbits format.
> > > 
> > > Cheers
> > > 
> > > Teryl
> > > 
> > > On Jul 4, 2015 2:00 PM, "Justin Swanhart" <[email protected]
> > > <mailto:[email protected]>> wrote:
> > > 
> > > Hi, 
> > > 
> > > John can correct me if I'm wrong but I think it is because the
> > > indexes can be binned, so the actual value must be retrieved for
> > > the column.
> > > 
> > > --justin
> > > 
> > > Sent from my iPhone
> > > 
> > > On Jul 4, 2015, at 9:47 AM, Mazen Kachmar
> > > <[email protected] <mailto:[email protected]>> wrote:
> > > 
> > > Dear John,
> > > Thank you in advance for your time. I am bit of a novice in
> > > this area, so my apologies for the quality of my question. I
> > > studied FastBit quite a bit in the last couple of days and
> > > stepped through the tcapi example in particular. Let me refine
> > > my question below a bit.
> > > 
> > > It looks like FastBit needs to serialize the index information
> > > into the filesystem. This is fine. However, I also noticed
> > > that it also serializes a transformation of the original data
> > > as well (I was looking at fastbit_add_values). Is this true?
> > > If so, why is FastBit duplicating the original data (in a
> > > different form)? In other words, why can't FastBit use the
> > > compressed index information to map to the original data?
> > > 
> > > Thank you,
> > > Mazen
> > > 
> > > ----------------------------------------------------------------------
> > > From: [email protected] <mailto:[email protected]>
> > > To: [email protected]
> > > <mailto:[email protected]>
> > > Date: Fri, 3 Jul 2015 12:27:53 +0000
> > > Subject: [FastBit-users] Does FastBit need to use filesystem?
> > > 
> > > Hi,
> > > I was reading
> > > on https://sdm.lbl.gov/~kewu/fastbit/doc/quickstart.html that
> > > FastBit partitions the data in a directory in the filesystem.
> > > Is this the only way to use FastBit? In other words, can
> > > FastBit be used as an in-memory data base? My data is in
> > > memory and I don't wish to serialize it to the hard disk.
> > > 
> > > Thanks,
> > > Mazen
> > > 
> > > _______________________________________________ FastBit-users
> > > mailing list [email protected]
> > > <mailto:[email protected]>
> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> > > 
> > > _______________________________________________
> > > FastBit-users mailing list
> > > [email protected] <mailto:[email protected]>
> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> > > 
> > > 
> > > _______________________________________________
> > > FastBit-users mailing list
> > > [email protected] <mailto:[email protected]>
> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> > > 
> > > 
> > > _______________________________________________ FastBit-users mailing
> > > list [email protected]
> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> > > 
> > > 
> > > _______________________________________________
> > > FastBit-users mailing list
> > > [email protected]
> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> > > 
> > _______________________________________________
> > FastBit-users mailing list
> > [email protected]
> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to