[
https://issues.apache.org/jira/browse/PARQUET-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200413#comment-17200413
]
Tristan Davolt edited comment on PARQUET-1773 at 9/22/20, 11:05 PM:
--------------------------------------------------------------------
It appears the issue was indeed due to the race condition mentioned above.
Adding a locking mechanism _ParquetWriter_ seems to have resolved the issue. In
addition, we are no longer receiving the error reported in PARQUET-632.
was (Author: tdavolt):
It appears the issue was indeed due to the race condition mentioned above.
Adding a locking mechanism _ParquetWriter_ seems to have resolved the issue. In
addition, we are no longer receiving the error reported in
[Parquet-632|https://issues.apache.org/jira/browse/PARQUET-632].
> Parquet file in invalid state while writing to S3 when calling
> ParquetWriter.write
> ----------------------------------------------------------------------------------
>
> Key: PARQUET-1773
> URL: https://issues.apache.org/jira/browse/PARQUET-1773
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.10.0
> Reporter: Tristan Davolt
> Priority: Major
>
> This may be related to PARQUET-632. I am also writing parquet to S3, but I am
> calling ParquetWriter.write directly. I have multiple containerized instances
> consuming messages from Kafka, converting them to Parquet, and then writing
> to S3. One instance will begin to throw this exception for all new messages.
> Sometimes, the container will recover. Other times, it must be restarted
> manually to recover. I am unable to find any "error thrown previously."
> Exception:
> java.io.IOException
> Message:
> The file being written is in an invalid state. Probably caused by an error
> thrown previously. Current state: BLOCK
> Stacktrace:
> {code:java}
> org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:168)org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:160)org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:291)org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:171)org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)