[ 
https://issues.apache.org/jira/browse/PARQUET-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641602#comment-15641602
 ] 

Uwe L. Korn commented on PARQUET-764:
-------------------------------------

PR: https://github.com/apache/parquet-cpp/pull/185

> [CPP] Parquet Writer does not write Boolean values correctly
> ------------------------------------------------------------
>
>                 Key: PARQUET-764
>                 URL: https://issues.apache.org/jira/browse/PARQUET-764
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Deepak Majeti
>            Assignee: Uwe L. Korn
>
> The core of the problem is due to 
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L203
> The bit packing happens for every Write(). However, the packing is done at 
> the byte level. If the number of (1-bit) values are not a multiple of 8, it 
> results in padding incorrect values (false for boolean).
> To reproduce: src/parquet/column/column-writer-test.cc
> {code}
> using TestBooleanValuesWriter = TestPrimitiveWriter<BooleanType>;
> TEST_F(TestBooleanValuesWriter, AlternateBooleanValues) {
>   this->SetUpSchema(Repetition::REQUIRED);
>   auto writer = this->BuildWriter();
>   for (int i = 0; i < SMALL_SIZE; i++) {
>       bool value = (i % 2 == 0) ? true :  false;
>       writer->WriteBatch(1, nullptr, nullptr, &value);
>   }
>   writer->Close();
>   this->ReadColumn();
>   for (int i = 0; i < SMALL_SIZE; i++) {
>       ASSERT_EQ((i % 2 == 0) ? true :  false, this->values_out_[i]) << i;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to