soumilshah1995 opened a new issue, #8977:
URL: https://github.com/apache/hudi/issues/8977

   i am trying our Hudi Data Quality mentioned on
   https://hudi.apache.org/docs/precommit_validator/#sql-query-single-result
   
   i have a field message and i was expecting message is null hudi  should 
throw error on append operations 
   Here is sample code 
   ```
   try:
       import sys, random, uuid
       from pyspark.context import SparkContext
       from pyspark.sql.session import SparkSession
       from awsglue.context import GlueContext
       from awsglue.job import Job
       from awsglue.dynamicframe import DynamicFrame
       from awsglue.utils import getResolvedOptions
   except Exception as e:
       print("Modules are missing: {} ".format(e))
   
   spark = (SparkSession.builder.config('spark.serializer', 
'org.apache.spark.serializer.KryoSerializer') \
            .config('spark.sql.hive.convertMetastoreParquet', 'false') \
            .config('spark.sql.catalog.spark_catalog', 
'org.apache.spark.sql.hudi.catalog.HoodieCatalog') \
            .config('spark.sql.extensions', 
'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
            .config('spark.sql.legacy.pathOptionBehavior.enabled', 
'true').getOrCreate())
   
   sc = spark.sparkContext
   glueContext = GlueContext(sc)
   job = Job(glueContext)
   logger = glueContext.get_logger()
   
   # Create sample data
   
   db_name = "hudidb"
   table_name = "test"
   recordkey = 'uuid'
   precombine = 'precomb'
   
   path = f"s3://soumilshah-hudi-demos/silver/table_name={table_name}/"
   method = 'upsert'
   table_type = "COPY_ON_WRITE"
   validator_query = """select count(*) from <TABLE_NAME>  where 
message=null;"""
   
   
   hudi_options = {
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.recordkey.field': recordkey,
       'hoodie.datasource.write.table.name': table_name,
       'hoodie.datasource.write.operation': method,
       'hoodie.datasource.write.precombine.field': precombine,
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2,
       "hoodie.precommit.validators": 
"org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator",
       "hoodie.precommit.validators.equality.sql.queries": validator_query
   }
   
   try:
       print("Trying Append 1")
       spark_df = spark.createDataFrame(
           data=[
               (1, "This is APPEND 1", 111, "1"),
               (2, "This is APPEND 2", 222, "2"),
               (3, "This is APPEND 5", 222, "3"),
           ],
           schema=["uuid", "message", "precomb", "partition"])
       spark_df.show()
       spark_df.write.format("hudi"). \
           options(**hudi_options). \
           mode("append"). \
           save(path)
       print("Append 1 Success....")
   except Exception as e:
       print("Failed to UPSERT", e)
   
   
   try:
       print("Trying Append 2")
       spark_df = spark.createDataFrame(
           data=[
               (4, None, 444, None),
               (5, "This is APPEND 5", 555, "5"),
           ],
           schema=["uuid", "message", "precomb", "partition"])
       spark_df.show()
       spark_df.write.format("hudi"). \
           options(**hudi_options). \
           mode("append"). \
           save(path)
       print("Append 2 Success....")
   
   except Exception as e:
       print("Failed to UPSERT", e)
   
   ```
   I was expecting to see a error message but it didnt throw an error did i do 
something wrong if yes someone can point me in right direction
   
   Regards
   Soumil
   
   
   References 
   https://hudi.apache.org/docs/precommit_validator/#sql-query-single-result
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to