Does it work to add a null value?

 startField("lstint", 0)
  startField("bag", 0)
   addValue(7)
   addValue(null)
  endField("bag", 0)
 endField("lstint", 0)

rb

On 10/03/2014 07:54 AM, Mickaël Lacour wrote:
Hello all :)


I'm working on this issue : https://issues.apache.org/jira/browse/HIVE-6994

My dataset is very simple : 3 columns. Here the schema (hadoop-tools schema):


message hive_schema {
   optional int32 id;
   optional group lstint (LIST) {
     repeated group bag {
       optional int32 array_element;
     }
   }
   optional group lststr (LIST) {
     repeated group bag {
       optional binary array_element;
     }
   }
}

And the content (hadoop-tools cat)

id = 2
lstint:
.bag:
..array_element = 7
lststr:
.bag:
..array_element = e
.bag:
..array_element = e

And the original data that I wanted to write ("|" is the column delimiter, and 
"," is the elements delimiter inside an array) :

1|7,|e,e

Here my issue: the size of my array (the first one called lstint) should be 2, 
but parquet is only keeping one field (the other is null). So for Parquet the 
size of my array is 1.
I want to keep this information and I don't know how to do it. Basically I 
cannot ask my recordConsumer to  startField if I have no value to add. If I do 
this, when I ask the recordConsumer to endField, I'm having this error :

throw new ParquetEncodingException("empty fields are illegal, the field should be 
ommitted completely instead");

So I can't do this, and I don't have any method inside the recordConsumer to add an empty 
field inside a "column". Of course If my array is null, parquet is going to add 
the null field for this missing column.

And another issue I have (related to this one). I cannot write an array with 
only null fields (|,,,|) I'm getting the previous exception.

Any advice ? (should we add a new method to be able to have empty fields?).

@Julien : I'm adding you in CC because I didn't see the last mail I sent to the 
mailing list. Can you forward it in case I don't have the right permission ? 
Thx !

--

Mickaël Lacour

Senior Software Engineer

Analytics Infrastructure team @Scalability

Criteo


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to