[GitHub] [hudi] yihua commented on pull request #3952: [HUDI-2102]support hilbert curve for hudi.

GitBox Fri, 26 Nov 2021 21:15:01 -0800


yihua commented on pull request #3952:
URL: https://github.com/apache/hudi/pull/3952#issuecomment-980504819



   @xiarixiaoyao @vinothchandar I try to run clustering with Hilbert Curve in 
Spark Shell using Spark datasource, but from Spark UI, Z-Ordering is still 
being used.  It seems to be off.
   
   <img width="1722" alt="Screen Shot 2021-11-26 at 21 11 34" 
src="https://user-images.githubusercontent.com/2497195/143668968-11b7f139-f0ba-493c-827a-5086c45b8390.png";>
   
   ```
   import org.apache.hudi.QuickstartUtils._
   import scala.collection.JavaConversions._
   import org.apache.spark.sql.SaveMode._
   import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.hudi.config.HoodieWriteConfig._
   
   val tableName = "hudi_trips_cow"
   val basePath = "file:///tmp/hudi_trips_cow"
   val dataGen = new DataGenerator
   
   val inserts = convertToStringList(dataGen.generateInserts(1000))
   val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
   
   df.write.format("hudi").
     option("hoodie.insert.shuffle.parallelism", "2").
     option("hoodie.upsert.shuffle.parallelism", "2").
     option("hoodie.bulkinsert.shuffle.parallelism", "2").
     option("hoodie.delete.shuffle.parallelism", "2").
     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
     option(TABLE_NAME, tableName).
     option("hoodie.parquet.small.file.limit", "0").
     option("hoodie.clustering.inline", "true").
     option("hoodie.clustering.inline.max.commits", "2").
     option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
"1073741824").
     option("hoodie.clustering.plan.strategy.small.file.limit", "629145600").
     option("hoodie.clustering.plan.strategy.sort.columns", "rider,driver").
     option("hoodie.layout.optimize.enable", "true").
     option("hoodie.layout.optimize.strategy", "hilbert").
     mode(Append).
     save(basePath)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on pull request #3952: [HUDI-2102]support hilbert curve for hudi.

Reply via email to