Hi Wes,

Thanks for the reply and sorry for the slow response. I just got back around to 
this. In my particular case it was easiest to handle this using 
WriteBatchSpaced. Works for me now. Thanks!

-Mike

-----Original Message-----
From: Wes McKinney [mailto:[email protected]] 
Sent: Monday, May 15, 2017 16:04
To: [email protected]
Subject: Re: representing NA values

hi Mike,

I think you want to use WriteBatch on TypedColumnWriter:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dcpp_blob_master_src_parquet_column_writer.h-23L166&d=DwIBaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiXDziBQ&m=zY1xRJEg8P5L1FEYKbuEmZeBsjDW2xlvNDEfDrDLKYw&s=3YrEJ34DzqvrbOHJw_oKgC1P6S7P5WCHPoslsmGzeKw&e=
 

For a flat table with an optional repetition type, the definition levels are a 
sequence of 1's and 0's, where 1 is for non-null values.
The array of values does not include nulls. We have an additional API 
WriteBatchSpaced for value arrays that include null slots. We use this in the 
Arrow writer:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dcpp_blob_master_src_parquet_arrow_writer.cc-23L391&d=DwIBaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiXDziBQ&m=zY1xRJEg8P5L1FEYKbuEmZeBsjDW2xlvNDEfDrDLKYw&s=3IM4BPxLn7tjRo4nlP6K3OGIx1mYRYVVfAXicrUn1PA&e=
 

You can also create arrow::DoubleArray and use the Arrow writer interface.

Hope this helps.

Wes

On Mon, May 15, 2017 at 2:01 PM, Julien Le Dem <[email protected]> wrote:
> Hi Mike,
> Is this a C++ question?
> Optional in the schema means it can be null/missing. It usually 
> translate into the definition level being 0 (null) or 1 (defined) 
> which is also how it is represented in an Arrow validity vector.
> Definition levels explanation here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.twitter.com_
> 2013_dremel-2Dmade-2Dsimple-2Dwith-2Dparquet&d=DwIBaQ&c=f5Q7ov8zryUUIG
> T55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiX
> DziBQ&m=zY1xRJEg8P5L1FEYKbuEmZeBsjDW2xlvNDEfDrDLKYw&s=naXPJZGPhWCANosT
> TF6HpUx9IJikusZEY1vnd9LkvVU&e=
>
>
> On Mon, May 15, 2017 at 7:39 AM, Katelman, Michael < 
> [email protected]> wrote:
>
>> Hi,
>>
>> I was wondering if someone could help me understand what  the correct 
>> way is to represent and write out NA values using a TypedColumnWriter.
>> Basically, what I want to do is, e.g., repeatedly write out batches 
>> of doubles where some of them may arbitrarily be None's. It seemed 
>> like maybe some combination of OPTIONAL, definition levels, and valid 
>> bits might work, but it wasn't exactly clear to me. So, if someone 
>> could point me in the right direction I would appreciate it.
>>
>> -Mike
>>
>>
>>
>>
>>
>> DISCLAIMER: This e-mail message and any attachments are intended 
>> solely for the use of the individual or entity to which it is 
>> addressed and may contain information that is confidential or legally 
>> privileged. If you are not the intended recipient, you are hereby 
>> notified that any dissemination, distribution, copying or other use 
>> of this message or its attachments is strictly prohibited. If you 
>> have received this message in error, please notify the sender 
>> immediately and permanently delete this message and any attachments.
>>
>>
>>
>>
>
>
> --
> Julien





DISCLAIMER: This e-mail message and any attachments are intended solely for the 
use of the individual or entity to which it is addressed and may contain 
information that is confidential or legally privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, copying or other use of this message or its attachments is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately and permanently delete this message and any attachments.



Reply via email to