When the data is stored in the column's vector, a native data type is used for storage. A 64 bit int with 19 digits takes 19 bytes in a text file but only 8 bytes (64 bits) on disk. This likely accounts for any storage size differences.
--justin Sent from my iPhone > On Jul 15, 2015, at 9:19 AM, Mazen Kachmar <[email protected]> wrote: > > Actually, I think I was a bit hasty. I don't think FastBit is doing any > compression on the original data. I was looking at the wrong files when I > made my measurements. > > Please correct me if i am wrong, but I think FastBit does not do any > compression on the original data. It would actually be interesting to do this > as it might make it faster to run queries. Anyone attempted this before? > > Thanks, > Mazen > > From: [email protected] > To: [email protected] > Subject: RE: [FastBit-users] Does FastBit need to use filesystem? > Date: Wed, 15 Jul 2015 15:29:03 +0000 > > Hi John, > I have a follow-up clarification question on this thread please. > > I noticed that the binary data files that FastBit produces is much smaller in > size than the original data. My CSV file is 42.8 MB, however the total size > of the binary files representing all the columns in the CSV file that FastBit > produced is about 24 MB. > > When FastBit transforms the data into its own format, does it compress the > data as well? If so, what compression scheme is it using? > > Thank you, > Mazen > > > From: [email protected] > > To: [email protected] > > Date: Tue, 7 Jul 2015 11:36:36 -0700 > > Subject: Re: [FastBit-users] Does FastBit need to use filesystem? > > > > Hi, Mazen, > > > > Thanks for your interest in FastBit software, and thanks Justin and > > Teryl for chiming in. The following is a bit more information. > > > > If you know how to feed the necessary data to FastBit, it is possible > > to keep your data in your own format. The FastQuery work is one > > attempt at this <http://www-vis.lbl.gov/Events/SC05/HDF5FastQuery/>. > > There was also an attempt at doing indexing and querying "in situ" > > <http://crd-legacy.lbl.gov/~kewu/ps/LBNL-5280E.html>, i.e., keeping > > everything in memory. > > > > Feel free to let us know if you need any additional information. > > > > John > > > > > > On 7/6/15 3:47 AM, Mazen Kachmar wrote: > > > Thanks Justin and Teryl. > > > > > > Justin, I think you are right. After reading this > > > paper http://crd-legacy.lbl.gov/~kewu/ps/LBNL-59952.pdf, things are a > > > bit more clear to me. FastBit needs to store the data in its own > > > binary form to do vertical partitioning and support the binning option. > > > > > > Teryl, thanks for the suggestion. Yes, it should be possible to create > > > a conceptual RAM disk. I was just wondering if FastBit actually has > > > that option. It does not look like it. I might actually implement this. > > > > > > John, once you have a moment, it would be great to get your > > > confirmation/blessing of this small discussion. > > > > > > Thanks! > > > Mazen > > > > > > ---------------------------------------------------------------------- > > > Date: Sat, 4 Jul 2015 14:03:50 -0400 > > > From: [email protected] > > > To: [email protected] > > > Subject: Re: [FastBit-users] Does FastBit need to use filesystem? > > > > > > Have you tried making a RAM disk to store the results? That way you > > > can keep everything in memory and still support fastbits format. > > > > > > Cheers > > > > > > Teryl > > > > > > On Jul 4, 2015 2:00 PM, "Justin Swanhart" <[email protected] > > > <mailto:[email protected]>> wrote: > > > > > > Hi, > > > > > > John can correct me if I'm wrong but I think it is because the > > > indexes can be binned, so the actual value must be retrieved for > > > the column. > > > > > > --justin > > > > > > Sent from my iPhone > > > > > > On Jul 4, 2015, at 9:47 AM, Mazen Kachmar > > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > Dear John, > > > Thank you in advance for your time. I am bit of a novice in > > > this area, so my apologies for the quality of my question. I > > > studied FastBit quite a bit in the last couple of days and > > > stepped through the tcapi example in particular. Let me refine > > > my question below a bit. > > > > > > It looks like FastBit needs to serialize the index information > > > into the filesystem. This is fine. However, I also noticed > > > that it also serializes a transformation of the original data > > > as well (I was looking at fastbit_add_values). Is this true? > > > If so, why is FastBit duplicating the original data (in a > > > different form)? In other words, why can't FastBit use the > > > compressed index information to map to the original data? > > > > > > Thank you, > > > Mazen > > > > > > ---------------------------------------------------------------------- > > > From: [email protected] <mailto:[email protected]> > > > To: [email protected] > > > <mailto:[email protected]> > > > Date: Fri, 3 Jul 2015 12:27:53 +0000 > > > Subject: [FastBit-users] Does FastBit need to use filesystem? > > > > > > Hi, > > > I was reading > > > on https://sdm.lbl.gov/~kewu/fastbit/doc/quickstart.html that > > > FastBit partitions the data in a directory in the filesystem. > > > Is this the only way to use FastBit? In other words, can > > > FastBit be used as an in-memory data base? My data is in > > > memory and I don't wish to serialize it to the hard disk. > > > > > > Thanks, > > > Mazen > > > > > > _______________________________________________ FastBit-users > > > mailing list [email protected] > > > <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > _______________________________________________ FastBit-users mailing > > > list [email protected] > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
