lamber-ken edited a comment on pull request #1469:
URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-626196481
Hi @vinothchandar @nsivabalan thanks for you review, all review comments are
addressed.
### SYNC
| task | status |
| ---- | ---- |
| fix an implicit bug which causes input record repeat | done and junit
covered |
| implement fetchRecordLocation | done and junit covered |
| junit tests for TestHoodieBloomIndexV2 | done |
| junit test for TestHoodieBloomRangeInfoHandle | done |
| revert HoodieTimer | done |
| global HoodieBloomIndexV2 | done and junit covered |
### Test
```
// BLOOM, GLOBAL_BLOOM, BLOOM_V2, GLOBAL_BLOOM_V2
export SPARK_HOME=/work/BigData/install/spark/spark-2.4.4-bin-hadoop2.7
${SPARK_HOME}/bin/spark-shell \
--jars `ls
packaging/hudi-spark-bundle/target/hudi-spark-bundle_*.*-*.*.*-SNAPSHOT.jar` \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
import org.apache.spark.sql.functions._
val tableName = "hudi_mor_table"
val basePath = "file:///tmp/hudi_mor_tablen"
val hudiOptions = Map[String,String](
"hoodie.upsert.shuffle.parallelism" -> "10",
"hoodie.datasource.write.recordkey.field" -> "name",
"hoodie.datasource.write.partitionpath.field" -> "location",
"hoodie.table.name" -> tableName,
"hoodie.datasource.write.precombine.field" -> "ts",
"hoodie.index.type" -> "BLOOM_V2"
)
var datas = List(
"""{ "name": "kenken1", "ts": 1574297893836, "age": 123, "location":
"2019-03-01"}""",
"""{ "name": "kenken1", "ts": 1574297893836, "age": 123, "location":
"2019-03-02"}"""
)
val inputDF = spark.read.json(spark.sparkContext.parallelize(datas, 2))
inputDF.write.format("org.apache.hudi").
options(hudiOptions).
mode("Overwrite").
save(basePath)
spark.read.format("org.apache.hudi").load(basePath + "/*/*").show();
// update
// update
var datas = List(
"""{ "name": "kenken1", "ts": 1574297893838, "age": 100, "location":
"2019-03-01"}"""
)
val inputDF = spark.read.json(spark.sparkContext.parallelize(datas, 2))
inputDF.write.format("org.apache.hudi").
options(hudiOptions).
mode("Append").
save(basePath)
spark.read.format("org.apache.hudi").load(basePath + "/*/*").show();
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]