[
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viktor Khristenko updated SPARK-20593:
--------------------------------------
Description:
Hi,
This is my first ticket and I apologize for/if I'm doing certain things in an
improper way.
I have a dataset:
{quote}
root
|-- muons: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- reco::Candidate: struct (nullable = true)
| | |-- qx3_: integer (nullable = true)
| | |-- pt_: float (nullable = true)
| | |-- eta_: float (nullable = true)
| | |-- phi_: float (nullable = true)
| | |-- mass_: float (nullable = true)
| | |-- vertex_: struct (nullable = true)
| | | |-- fCoordinates: struct (nullable = true)
| | | | |-- fX: float (nullable = true)
| | | | |-- fY: float (nullable = true)
| | | | |-- fZ: float (nullable = true)
| | |-- pdgId_: integer (nullable = true)
| | |-- status_: integer (nullable = true)
| | |-- cachePolarFixed_: struct (nullable = true)
| | |-- cacheCartesianFixed_: struct (nullable = true)
{quote}
As you can see, there are 3 empty structs in this schema. I know 100% that I
can read/manipulate/do whatever. However, when I try writing to disk in
parquet, I get the following Exception:
ds.write.format("parquet").save(outputPathName):
java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended
behavior??? I also assume that it's related to the empty structs. Any help
would be really appreciated!
I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]
VK
[1]
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group
was:
Hi,
This is my first ticket and I apologize for/if I'm doing certain things in an
improper way.
I have a dataset:
root
- muons: array (nullable = true)
- element: struct (containsNull = true)
- reco::Candidate: struct (nullable = true)
- qx3_: integer (nullable = true)
- pt_: float (nullable = true)
- eta_: float (nullable = true)
- phi_: float (nullable = true)
- mass_: float (nullable = true)
- vertex_: struct (nullable = true)
- fCoordinates: struct (nullable = true)
- fX: float (nullable = true)
- fY: float (nullable = true)
- fZ: float (nullable = true)
- pdgId_: integer (nullable = true)
- status_: integer (nullable = true)
- cachePolarFixed_: struct (nullable = true)
- cacheCartesianFixed_: struct (nullable = true)
As you can see, there are 3 empty structs in this schema. I know 100% that I
can read/manipulate/do whatever. However, when I try writing to disk in
parquet, I get the following Exception:
ds.write.format("parquet").save(outputPathName):
java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended
behavior??? I also assume that it's related to the empty structs. Any help
would be really appreciated!
I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]
VK
[1]
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group
> Writing Parquet: Cannot build an empty group
> --------------------------------------------
>
> Key: SPARK-20593
> URL: https://issues.apache.org/jira/browse/SPARK-20593
> Project: Spark
> Issue Type: Question
> Components: Spark Core, Spark Shell
> Affects Versions: 2.1.1
> Environment: I use Apache Spark 2.1.1 (used 2.1.0 and it was the
> same, switched today). Tested only Mac
> Reporter: Viktor Khristenko
> Priority: Minor
>
> Hi,
> This is my first ticket and I apologize for/if I'm doing certain things in an
> improper way.
> I have a dataset:
> {quote}
> root
> |-- muons: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- reco::Candidate: struct (nullable = true)
> | | |-- qx3_: integer (nullable = true)
> | | |-- pt_: float (nullable = true)
> | | |-- eta_: float (nullable = true)
> | | |-- phi_: float (nullable = true)
> | | |-- mass_: float (nullable = true)
> | | |-- vertex_: struct (nullable = true)
> | | | |-- fCoordinates: struct (nullable = true)
> | | | | |-- fX: float (nullable = true)
> | | | | |-- fY: float (nullable = true)
> | | | | |-- fZ: float (nullable = true)
> | | |-- pdgId_: integer (nullable = true)
> | | |-- status_: integer (nullable = true)
> | | |-- cachePolarFixed_: struct (nullable = true)
> | | |-- cacheCartesianFixed_: struct (nullable = true)
> {quote}
> As you can see, there are 3 empty structs in this schema. I know 100% that I
> can read/manipulate/do whatever. However, when I try writing to disk in
> parquet, I get the following Exception:
> ds.write.format("parquet").save(outputPathName):
> java.lang.IllegalStateException: Cannot build an empty group
> at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
> at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
> So, basically I would like to understand if it's a bug or an intended
> behavior??? I also assume that it's related to the empty structs. Any help
> would be really appreciated!
> I've quickly created stripped version and that one works without any issues!
> For reference, I put a link to a original question on SO[1]
> VK
> [1]
> http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]