[
https://issues.apache.org/jira/browse/PARQUET-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056366#comment-16056366
]
Ashima Sood commented on PARQUET-1036:
--------------------------------------
Also, when using spark-sql on unix. the results returned are as below:
spark-sql> select * from <table> limit 5;
17/06/20 19:53:58 INFO SparkSqlParser: Parsing command: select * from <table>
limit 5
17/06/20 19:53:58 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
10.107.206.68:40150 in memory (size: 2.1 KB, free: 413.9 MB)
17/06/20 19:53:58 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
ip-10-107-206-78.fmrco.com:33367 in memory (size: 2.1 KB, free: 2.8 GB)
17/06/20 19:53:58 INFO ContextCleaner: Cleaned accumulator 0
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:53:58 INFO CatalystSqlParser: Parsing command: string
17/06/20 19:54:00 INFO FileSourceStrategy: Pruning directories with:
17/06/20 19:54:00 INFO FileSourceStrategy: Post-Scan Filters:
17/06/20 19:54:00 INFO FileSourceStrategy: Output Data Schema: struct<date:
string, row_id: string, status: string, gen_time: string, gen_month: string ...
108 more fields>
17/06/20 19:54:00 INFO FileSourceStrategy: Pushed Filters:
17/06/20 19:54:00 WARN Utils: Truncated the string representation of a plan
since it was too large. This behavior can be adjusted by setting
'spark.debug.maxToStringFields' in SparkEnv.conf.
17/06/20 19:54:00 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 350.8 KB, free 413.6 MB)
17/06/20 19:54:00 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in
memory (estimated size 31.9 KB, free 413.6 MB)
17/06/20 19:54:00 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on
10.107.206.68:40150 (size: 31.9 KB, free: 413.9 MB)
17/06/20 19:54:00 INFO SparkContext: Created broadcast 1 from processCmd at
CliDriver.java:376
17/06/20 19:54:00 INFO FileSourceScanExec: Planning scan with bin packing, max
size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
17/06/20 19:54:00 INFO SparkContext: Starting job: processCmd at
CliDriver.java:376
17/06/20 19:54:00 INFO DAGScheduler: Got job 1 (processCmd at
CliDriver.java:376) with 1 output partitions
17/06/20 19:54:00 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at
CliDriver.java:376)
17/06/20 19:54:00 INFO DAGScheduler: Parents of final stage: List()
17/06/20 19:54:00 INFO DAGScheduler: Missing parents: List()
17/06/20 19:54:00 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[6] at processCmd at CliDriver.java:376), which has no missing
parents
17/06/20 19:54:00 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 18.8 KB, free 413.5 MB)
17/06/20 19:54:00 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in
memory (estimated size 7.5 KB, free 413.5 MB)
17/06/20 19:54:00 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on
10.107.206.68:40150 (size: 7.5 KB, free: 413.9 MB)
17/06/20 19:54:00 INFO SparkContext: Created broadcast 2 from broadcast at
DAGScheduler.scala:996
17/06/20 19:54:00 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[6] at processCmd at CliDriver.java:376)
17/06/20 19:54:00 INFO YarnScheduler: Adding task set 1.0 with 1 tasks
17/06/20 19:54:00 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1,
ip-10-107-206-78.fmrco.com, executor 1, partition 0, RACK_LOCAL, 6573 bytes)
17/06/20 19:54:00 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on
ip-10-107-206-78.fmrco.com:33367 (size: 7.5 KB, free: 2.8 GB)
17/06/20 19:54:01 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on
ip-10-107-206-78.fmrco.com:33367 (size: 31.9 KB, free: 2.8 GB)
17/06/20 19:54:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1)
in 4476 ms on ip-10-107-206-78.fmrco.com (executor 1) (1/1)
17/06/20 19:54:05 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all
completed, from pool
17/06/20 19:54:05 INFO DAGScheduler: ResultStage 1 (processCmd at
CliDriver.java:376) finished in 4.477 s
17/06/20 19:54:05 INFO DAGScheduler: Job 1 finished: processCmd at
CliDriver.java:376, took 4.518839 s
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 6.702 seconds, Fetched 5 row(s)
17/06/20 19:54:05 INFO CliDriver: Time taken: 6.702 seconds, Fetched 5 row(s)
> parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark
> ------------------------------------------------------------------------------
>
> Key: PARQUET-1036
> URL: https://issues.apache.org/jira/browse/PARQUET-1036
> Project: Parquet
> Issue Type: Bug
> Reporter: Ashima Sood
> Priority: Blocker
>
> using spark sql unable to read parquet file and shows null values. whereas
> hive reads the values fine.
> 17/06/19 17:50:36 WARN CorruptStatistics: Ignoring statistics because
> created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0
> org.apache.parquet.VersionParser$VersionParseException: Could not parse
> created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*)
> )?\(build ?(.*)\)
> at
> org.apache.parquet.VersionParser.parse(VersionParser.java:112)
> at
> org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)