Hi, Andrew,

Depending whether your perl script produces CSV file or FastBit binary
files, there are different options.

If you produce CSV files, an empty field (simply having a coma
delimiter for a particular column) is taken to be indicating a NULL
value.  The program ardea.cpp is designed to recognize this and
generates appropriate NULL masks.  In a CSV file, either "" or ''
indicate an empty string.  In most cases, an empty string is
effectively a NULL value.

If you are producing FastBit binary files, you can produce .msk files
as follows.  Use unsigned 32-bit words, using the lower 31 bits of
each word and leave the most significant bit as 0.  Record a valid
value as 1 and a null value as 0.  Place the bits from the more
significant position to the less significant position.  Each whole
word represent the status of 31 rows, any remainder needs another
word.  Say there are k rows left, you will need a word to record the
value k and another word to record the values of these k bits.  In a
.msk file, word record the value k is the last word and the k bits are
placed in second to the last word.  The last k bits are stored in the
lowest k positions of the second to the last word.

Hope this helps.

John



On 8/29/12 8:50 AM, Olson, Andrew wrote:
> I've been converting text files to FastBit partitions in perl and I
> need to be able to create a .msk file because I have some null
> values.  What is the format of the .msk file?  Is it WAH
> compressed?  If so, does FastBit replace an uncompressed .msk file
> automatically?  If not, can ardea produce this for me?
> 
> Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to