[
https://issues.apache.org/jira/browse/PARQUET-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu updated PARQUET-2026:
-----------------------------
Fix Version/s: 1.14.0
(was: 1.13.0)
> Allow empty row in parquet file
> -------------------------------
>
> Key: PARQUET-2026
> URL: https://issues.apache.org/jira/browse/PARQUET-2026
> Project: Parquet
> Issue Type: Task
> Components: parquet-mr
> Affects Versions: 1.12.0
> Reporter: Vitalii Diravka
> Priority: Major
> Labels: Drill, empty-file
> Fix For: 1.14.0
>
> Attachments: Screenshot from 2021-04-13 08-52-56.png
>
>
> PARQUET-1851 starts abandon to write parquet files with schema (meta
> information), but with 0 rows, aka empty files.
> In result it prevent to store empty tables in DRILL by using parquet files,
> for example:
> {code:java}
> CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0{code}
> {code:java}
> CREATE TABLE dfs.tmp.%s AS select * from
> dfs.`parquet/alltypes_required.parquet` where `col_int` = 0{code}
> {code:java}
> create table dfs.tmp.%s as select * from
> dfs.`parquet/empty/complex/empty_complex.parquet`{code}
> So PARQUET-1851 breaks the following test cases:
> {code:java}
> TestUntypedNull.testParquetTableCreation
> TestParquetWriterEmptyFiles.testComplexEmptyFileSchema
> TestParquetWriterEmptyFiles.testWriteEmptyFile
> TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema
> TestParquetWriterEmptyFiles.testWriteEmptySchemaChange
> TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable
> TestMetastoreCommands.testSelectEmptyRequiredParquetTable{code}
> I suggest to use warning in the process of creating empty parquet files or
> create alternative _endBlock_ for backward compatibility with other tools:
> !Screenshot from 2021-04-13 08-52-56.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)