wangfei created SPARK-3720: ------------------------------ Summary: support ORC in spark sql Key: SPARK-3720 URL: https://issues.apache.org/jira/browse/SPARK-3720 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.1.0 Reporter: wangfei
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store data on hdfs.ORC file format has many advantages such as: 1 a single file as the output of each task, which reduces the NameNode's load 2 Hive type support including datetime, decimal, and the complex types (struct, list, map, and union) 3 light-weight indexes stored within the file skip row groups that don't pass predicate filtering seek to a given row 4 block-mode compression based on data type run-length encoding for integer columns dictionary encoding for string columns 5 concurrent reads of the same file using separate RecordReaders 6 ability to split files without scanning for markers 7 bound the amount of memory needed for reading or writing 8 metadata stored using Protocol Buffers, which allows addition and removal of fields -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org