jinchengchenghh commented on PR #5447: URL: https://github.com/apache/incubator-gluten/pull/5447#issuecomment-2105615316
TPCH SF2000 Q6 performance, query: `select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01' and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24` lineitem data: 622G <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/cjin/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/cjin/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto;} col {mso-width-source:auto;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:Calibri, sans-serif; mso-font-charset:0; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} --> </style> </head> <body link="#0563C1" vlink="#954F72"> csv gluten without native reader | csv gluten native csv reader -- | -- 8333.039907 | 2456 </body> </html> Test script: ``` val schema = new StructType().add("l_orderkey", LongType).add("l_partkey", LongType).add("l_suppkey", LongType).add("l_linenumber", LongType).add("l_quantity", DoubleType).add("l_extendedprice", DoubleType).add("l_discount", DoubleType).add("l_tax", DoubleType).add("l_returnflag", StringType).add("l_linestatus", StringType).add("l_shipdate", DateType).add("l_commitdate", DateType).add("l_receiptdate", DateType).add("l_shipinstruct", StringType).add("l_shipmode", StringType).add("l_comment", StringType) val lineitem = spark.read.format("csv").option("header","true").schema(schema).load("file:///mnt/DP_disk2/tpch/csvdata/") spark.sql(q6) ``` Note: because the file schema should match Arrow schema, so we should specify the schema by `.schema(arrow_matched_schema)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
