liujiayi771 opened a new issue, #7999:
URL: https://github.com/apache/incubator-gluten/issues/7999

   ### Description
   
   Currently, the parquet file name written by Gluten is 
   
Gluten_Stage_3_TID_2124_VTID_257_0_3_0946dfb5-f773-42c9-ac8e-d4e70bede02b.parquet
   which is generated by the default behavior in velox `HiveDataSink.cpp`
   ```cpp
   targetFileName = fmt::format(
           "{}_{}_{}_{}",
           connectorQueryCtx_->taskId(),
           connectorQueryCtx_->driverId(),
           connectorQueryCtx_->planNodeId(),
           makeUuid());
   ```
   
   https://github.com/facebookincubator/velox/pull/10903 add a new 
`targetFileName` in `LocationHandle`, so we can specify the `targetFileName` 
that contains compression kind suffix from Gluten side, which is more 
consistent with the parquet file name generated by vanilla Spark.
   
   The parquet files generated by Spark are named 
part-uuid.<codec-extension>.parquet. I have defined the name of the parquet 
file written by Gluten as gluten-part-uuid.<codec-extension>.parquet, with the 
gluten prefix added to indicate that the file is generated by Gluten.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to