[
https://issues.apache.org/jira/browse/DRILL-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Turton updated DRILL-8388:
--------------------------------
Description:
I'll refine this ticket as I discover more but at the current time I believe
this bug can reproduced as follows.
# The Drill writer format is set to Parquet.
# A CTAS statement is issued over JDBC (the bug does not appear to manifest
for the same query received over REST).
# The CTAS statement spawns multiple Parquet writer fragments. It may also be
necessary that these fragments are distributed over more than one Drillbit
(unconfirmed on a single Drillbit).
# Some of the Parquet writer fragments receive batches containing zero records.
# The query is apparently cancelled (by the Drill/JDBC client?) before all of
the writer fragments have completed.
# Some writer fragments have created no output file at all. Others have
created invalid, zero-byte Parquet files. Others have created valid empty
Parquet files and others have created valid non-empty Parquet files.
# A subsequent query against the destination fails because it encounters
zero-byte Parquet files.
was:
I'll refine this ticket as I discover more but at the current time I believe
this bug can reproduced as follows.
# The Drill writer format is set to Parquet.
# A CTAS statement is issued over JDBC (the bug does not appear to manifest
for the same query received over REST).
# The CTAS statement spawns multiple Parquet writer fragments. It may also be
necessary that these fragments are distributed over more than one Drillbit
(unconfirmed on a single Drillbit).
# Some of the Parquet writer fragments receive batches containing zero records.
# The query completes but ends in the cancelled state.
# Invalid, zero-byte Parquet files are written to the writer destination by
the writer fragments that received zero records.
# A subsequent query against the destination fails because it encounters
zero-byte Parquet files.
> Zero-record Parquet writer fragments result in query cancellation and
> zero-byte Parquet files
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-8388
> URL: https://issues.apache.org/jira/browse/DRILL-8388
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Writer
> Affects Versions: 1.20.3
> Reporter: James Turton
> Assignee: James Turton
> Priority: Major
> Fix For: 1.21.0
>
>
> I'll refine this ticket as I discover more but at the current time I believe
> this bug can reproduced as follows.
> # The Drill writer format is set to Parquet.
> # A CTAS statement is issued over JDBC (the bug does not appear to manifest
> for the same query received over REST).
> # The CTAS statement spawns multiple Parquet writer fragments. It may also
> be necessary that these fragments are distributed over more than one Drillbit
> (unconfirmed on a single Drillbit).
> # Some of the Parquet writer fragments receive batches containing zero
> records.
> # The query is apparently cancelled (by the Drill/JDBC client?) before all
> of the writer fragments have completed.
> # Some writer fragments have created no output file at all. Others have
> created invalid, zero-byte Parquet files. Others have created valid empty
> Parquet files and others have created valid non-empty Parquet files.
> # A subsequent query against the destination fails because it encounters
> zero-byte Parquet files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)