Hi, Dominique,

FastBit does not support reordering of strings at this time.  The main
reason is that reordering strings would require massive shuffling of
bytes in the data file, which would be quite slow.  Furthermore, most
of my applications don't need it so far ;-)

John


On 1/13/12 2:54 PM, Dominique Prunier wrote:
> Hey John,
> 
> I tried playing a little bit with reorder, but it seems that it only applies 
> to numerical columns, not CATEGORY columns.
> Is there any ways to do it on CATEGORY columns ?
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Wednesday, January 11, 2012 11:21 AM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: [Spam?]Re: [FastBit-users] FastBit binary format reference
> 
> Hi, Dominique,
> 
> The dictionary is described in
> <http://lbl.gov/~kwu/fastbit/doc/html/classibis_1_1dictionary.html>.
> The layout of a dictionary file is given with the function
> ibis::dictionary::write <http://su.pr/1ZvUr5>.
> 
> The dictionary is a bidirectional mapping between integer ids and
> their corresponding string values.  Each new string is given an
> integer id the first time the string is inserted into the dictionary.
>  There is no deletion from the dictionary, and the integer value 0 is
> reserved for null strings.
> 
> The dictionary has a sorted version of the strings, and the dictionary
> file always stores the strings in the sorted order, however there are
> arrays to record their corresponding integer ids.
> 
> In class ibis::part, there are three functions with the name reorder
> <http://su.pr/2icClc>.  Please take a look at them and see if they
> meet your needs.
> 
> John
> 
> 
> 
> On 1/10/12 3:03 PM, Dominique Prunier wrote:
>> Cool, i just finished my prototype and it does work pretty well (generates 
>> the data file and the -part.txt file).
>>
>> Now, i'm interested in the .dic file format.
>>
>> So far, from what i understand, i have:
>>
>> #IBIS Dictionary
>> <some data>
>> <the NULL separated list of distinct values>
>>
>> I'm interested on what the data is. Also, it seems that the dictionary is 
>> always sorted, is it true or just a coincidence ?
>>
>> My next goal is to sort a partition according to some columns. Is there any 
>> built-in functionality that i could use in FastBit ?
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Tuesday, January 10, 2012 12:54 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] FastBit binary format reference
>>
>> Hi, Dominique,
>>
>> FastBit used raw binary on the machine it is running, there is not
>> translation of endianness.  In a word used in any bit vector, the most
>> significant bit is the flag bit.
>>
>> FastBit software has been tried on SPARC before and should work.
>> Please let me know if you find any problem with it.
>>
>> Good luck.
>>
>> John
>>
>>
>> On 1/10/12 9:39 AM, Dominique Prunier wrote:
>>> Thanks John,
>>>
>>> I'll try that very soon (maybe today). What endianness is used to write 32 
>>> bits words ? The machine one ? A fixed one ?
>>>
>>> As a related subject, would FastBit works on SPARC architecture ? What is 
>>> the support for Solaris (SPARC and/or AMD64) ?
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]] 
>>> Sent: Monday, January 09, 2012 7:48 PM
>>> To: FastBit Users
>>> Cc: Dominique Prunier
>>> Subject: Re: [FastBit-users] FastBit binary format reference
>>>
>>> Hi, Dominique,
>>>
>>> The null mask is the difficult one...  Here is one way to write them
>>> out without referring to FastBit itself.
>>>
>>> One can write an uncompressed bit vector object using the following way
>>>
>>> - use an array of 32-bit unsigned integers
>>> - use only the lower 31 bits of every word
>>> - each bit corresponds to one row, for example, a.msk will be for the
>>> column named a (which is stored in the file named a)
>>> - each row with a valid value is marked with 1, a null value is masked
>>> as 0 in the NULL mask
>>> - when the number of rows is not a multiple of 31, the remainder is
>>> stored in a separate 32-bit word, followed by another 32-bit word to
>>> indicate the number of bits used in the preceding word
>>> - where the number of rows is a multiple of 31, the sequence is
>>> terminated with 32-bit word of 0
>>>
>>> Let me know if you have any questions.
>>>
>>> John
>>>
>>>
>>>
>>>
>>> On 1/9/12 1:16 PM, Dominique Prunier wrote:
>>>> Thanks John,
>>>>
>>>> It seems that tests/setqgen.cpp uses the tablex object from the library 
>>>> but tests/readcsv.cpp seems to do what i'm looking for.
>>>> However, it is missing an example of null mask. Is it possible to do that 
>>>> without using the library (the .msk formatdoesn't seems straightforward) ?
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]] 
>>>> Sent: Monday, January 09, 2012 3:59 PM
>>>> To: FastBit Users
>>>> Cc: Dominique Prunier
>>>> Subject: Re: [FastBit-users] FastBit binary format reference
>>>>
>>>> Hi, Dominique,
>>>>
>>>> The program tests/setqgen.cpp is a C++ program that generates on disk
>>>> data without depending on the rest of FastBit code.  Another example
>>>> is tests/readcsv.cpp.
>>>>
>>>> Roughly, a data file is named after the column name.  A directory
>>>> contains a "data partition" with a metadata file named '-part.txt'
>>>> (note the dash in the front of the file name).  The metadata file
>>>> contains names of the columns and their types.
>>>>
>>>> Hope this helps.
>>>>
>>>> John
>>>>
>>>>
>>>> On 1/9/12 12:39 PM, Dominique Prunier wrote:
>>>>> Hi,
>>>>>
>>>>>  
>>>>>
>>>>> I was wondering if a reference of the on-disk format exists somewhere.
>>>>>
>>>>> My goal would be to be able to create a partition using an external
>>>>> lightweight process without any dependency to the fastbit library.
>>>>>
>>>>> I already know some basics from
>>>>> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html but it
>>>>> doesn’t explain much about .dic files (for Category string), all the
>>>>> required fields in the –part.txt file and the format of the null mask.
>>>>>
>>>>>  
>>>>>
>>>>> First of all, i don’t know if it is a recommended way of doing things
>>>>> (if the on-disk format changes often, ...), and if it is possible to
>>>>> create partitions this way, i’d like to know where i can find the most
>>>>> up-to-date information.
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  
>>>>>
>>>>> */Dominique Prunier/**//*
>>>>>
>>>>>  APG Lead Developper
>>>>>
>>>>> Logo-W4N-100dpi
>>>>>
>>>>>  4388, rue Saint-Denis
>>>>>
>>>>>  Bureau 309
>>>>>
>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>
>>>>>  Tel. +1 514-842-6767  x310
>>>>>
>>>>>  Fax +1 514-842-3989
>>>>>
>>>>>  [email protected] <mailto:[email protected]>
>>>>>
>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>
>>>>> /  /
>>>>>
>>>>> /This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise private information. If you have
>>>>> received it in error, please notify the sender immediately and delete
>>>>> the original. Any other use of this electronic mail by you is prohibited.
>>>>>
>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>> courrier électronique par vous est prohibée.///
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to