Deepak Majeti created PARQUET-764:
-------------------------------------

             Summary: [CPP] Parquet Writer does not write Boolean values 
correctly
                 Key: PARQUET-764
                 URL: https://issues.apache.org/jira/browse/PARQUET-764
             Project: Parquet
          Issue Type: Bug
            Reporter: Deepak Majeti
            Assignee: Deepak Majeti


The core of the problem is due to 
https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L203
The bit packing happens for every Write(). However, the packing is done at the 
byte level. If the number of (1-bit) values are not a multiple of 8, it results 
in padding incorrect values (false for boolean).

To reproduce: src/parquet/column/column-writer-test.cc
{code}
using TestBooleanValuesWriter = TestPrimitiveWriter<BooleanType>;
TEST_F(TestBooleanValuesWriter, AlternateBooleanValues) {
  this->SetUpSchema(Repetition::REQUIRED);
  auto writer = this->BuildWriter();
  for (int i = 0; i < SMALL_SIZE; i++) {
      bool value = (i % 2 == 0) ? true :  false;
      writer->WriteBatch(1, nullptr, nullptr, &value);
  }
  writer->Close();
  this->ReadColumn();
  for (int i = 0; i < SMALL_SIZE; i++) {
      ASSERT_EQ((i % 2 == 0) ? true :  false, this->values_out_[i]) << i;
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to