[ https://issues.apache.org/jira/browse/SPARK-26985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795931#comment-16795931 ]
Anuja Jakhade edited comment on SPARK-26985 at 3/19/19 10:24 AM: ----------------------------------------------------------------- Hi [~srowen], [~hyukjin.kwon] I have observed that after changing the ByteOrder in _*[OnHeapColumnVector.java|https://github.com/apache/spark/blob/v2.3.2/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java]*_ to *ByteOrder.BIG_ENDIAN* the tests passes. Because the float and double data is read properly. However in that case some tests of Paraquet Module fails. e.x: *ParquetIOSuite.* Is there any specific reason why we are using ByteOrder format as LITTLE_ENDIAN even when the bigEndianPlatform is true. The above fix however, doesn't work on all the test cases and the behavior of *ParquetIOSuite and DataFrameTungsten/InMemoryColumnarQuerySuite* compliment each other. *ParquetIOSuite* passes only when ByteOrder is set to ByteOrder.LITTLE_ENDIAN and *DataFrameTungsten/InMemoryColumnarQuerySuite* passes only when **ByteOrder is set to ByteOrder.BIG_ENDIAN. was (Author: anuja): Hi [~srowen], [~hyukjin.kwon] I have observed that after changing the ByteOrder in _*[OnHeapColumnVector.java|https://github.com/apache/spark/blob/v2.3.2/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java]*_ to *ByteOrder.BIG_ENDIAN* the tests passes. Because the float and double data is read properly. However in that case some tests of Paraquet Module fails. e.x: *ParquetIOSuite.* Is there any specific reason why we are using ByteOrder format as LITTLE_ENDIAN even when the bigEndianPlatform is true. The above fix however, doesn't work on all the test cases and the behavior of *ParquetIOSuite and DataFrameTungsten/InMemoryColumnarQuerySuite* compliment each other. *ParquetIOSuite* passes only when ByteOrder is set to ByteOrder.LITTLE_ENDIAN and *DataFrameTungsten/InMemoryColumnarQuerySuite* passes only when **ByteOrder is set to ByteOrder.BIG_ENDIAN. > Test "access only some column of the all of columns " fails on big endian > ------------------------------------------------------------------------- > > Key: SPARK-26985 > URL: https://issues.apache.org/jira/browse/SPARK-26985 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.2 > Environment: Linux Ubuntu 16.04 > openjdk version "1.8.0_202" > OpenJDK Runtime Environment (build 1.8.0_202-b08) > Eclipse OpenJ9 VM (build openj9-0.12.1, JRE 1.8.0 64-Bit Compressed > References 20190205_218 (JIT enabled, AOT enabled) > OpenJ9 - 90dd8cb40 > OMR - d2f4534b > JCL - d002501a90 based on jdk8u202-b08) > > Reporter: Anuja Jakhade > Priority: Major > Labels: BigEndian > Attachments: DataFrameTungstenSuite.txt, > InMemoryColumnarQuerySuite.txt, access only some column of the all of > columns.txt > > > While running tests on Apache Spark v2.3.2 with AdoptJDK on big endian, I am > observing test failures for 2 Suites of Project SQL. > 1. InMemoryColumnarQuerySuite > 2. DataFrameTungstenSuite > In both the cases test "access only some column of the all of columns" fails > due to mismatch in the final assert. > Observed that the data obtained after df.cache() is causing the error. Please > find attached the log with the details. > cache() works perfectly fine if double and float values are not in picture. > Inside test !!!!!!- access only some column of the all of columns *** FAILED > *** -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org