zeroshade commented on issue #727:
URL: https://github.com/apache/arrow-go/issues/727#issuecomment-4128576847

   Looking through the code, one thing I've noticed is that we don't actually 
validate that the schema of the recordbatch matches the schema of the parquet 
file being written. In that scenario, if the schemas don't line up or if the 
record batch has more columns than is expected etc. that could result in the 
error you're encountering. For example, it attempts to load the encoder for a 
given int64 or string/binary column in the record batch, but there's no 
corresponding leaf column in the parquet schema. In that situation the encoder 
would be `nil` instead of the expected type. 
   
   Can you try dumping the actual schema of the record batch being written at 
the point you encounter this error and validate it against the schema that was 
used when you started the file? Make sure they are *exactly* identical.
   
   As for the error you're encountering on the failure of WriteBuffered, that 
would be because it failed to write values to a particular column of the row 
group while writing values to the columns before that. But parquet requires ALL 
column chunks in a row group to have the same number of rows. So if it failed 
writing a chunk of rows to a column because of the failed WriteBuffered, the 
check performed when closing the file would trigger the error you're seeing 
that there are fewer rows in that column than in the metadata and in the other 
rows.
   
   This is probably a case where we can improve the error output to better 
report what is going on to the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to