Hi, All.

Today, while I’m looking over JIRA issues for Spark 2.2.0 in Apache Spark.
I noticed that there are many unresolved community requests and related efforts 
over `Feature parity for ORC with Parquet`.
Some examples I found are the following. I created SPARK-20901 to organize 
these although I’m not in the body to do this.
Please let me know if this is not a proper way in the Apache Spark community.
I think we can leverage or transfer the improvement of Parquet in Spark.

SPARK-11412   Support merge schema for ORC
SPARK-12417   Orc bloom filter options are not propagated during file write in 
spark
SPARK-14286   Empty ORC table join throws exception
SPARK-14387   Enable Hive-1.x ORC compatibility with 
spark.sql.hive.convertMetastoreOrc
SPARK-15347   Problem select empty ORC table
SPARK-15474   ORC data source fails to write and read back empty dataframe
SPARK-15682   Hive ORC partition write looks for root hdfs folder for existence
SPARK-15731   orc writer directory permissions
SPARK-15757   Error occurs when using Spark sql ""select"" statement on orc 
file …
SPARK-16060   Vectorized Orc reader
SPARK-16628   OrcConversions should not convert an ORC table represented by 
MetastoreRelation to HadoopFsRelation if …
SPARK-17047   Spark 2 cannot create ORC table when CLUSTERED
SPARK-18355   Spark SQL fails to read data from a ORC hive table that has a new 
column added to it
SPARK-18540   Wholestage code-gen for ORC Hive tables
SPARK-19109   ORC metadata section can sometimes exceed protobuf message size 
limit
SPARK-19122   Unnecessary shuffle+sort added if join predicates ordering differ 
from bucketing and sorting order
SPARK-19430   Cannot read external tables with VARCHAR columns if they're 
backed by ORC files written by Hive 1.2.1
SPARK-19809   NullPointerException on empty ORC file
SPARK-20515   Issue with reading Hive ORC tables having char/varchar columns in 
Spark SQL
SPARK-20682   Implement new ORC data source based on Apache ORC
SPARK-20728   Make ORCFileFormat configurable between sql/hive and sql/core
SPARK-20799   Unable to infer schema for ORC on reading ORC from S3

Bests,
Dongjoon.

Reply via email to