[GitHub] [hudi] soumilshah1995 commented on issue #9764: [SUPPORT] Apache Hudi and Flink with Python Running Locally

via GitHub Thu, 21 Sep 2023 15:08:35 -0700


soumilshah1995 commented on issue #9764:
URL: https://github.com/apache/hudi/issues/9764#issuecomment-1730362628


   Running into Different Issue 
   after setting these 
   ```
   import os
   
   os.environ['AWS_ACCESS_KEY_ID'] = "XXXX"
   os.environ['AWS_ACCESS_KEY'] = "XX"
   
   os.environ['AWS_SECRET_ACCESS_KEY'] = "XX"
   os.environ['AWS_SECRET_KEY'] = "XX"
   ```
   
   
   # Code 
   ```
   from pyflink.table import EnvironmentSettings, TableEnvironment
   import os
   from faker import Faker
   
   # Create a batch TableEnvironment
   env_settings = EnvironmentSettings.in_batch_mode()
   table_env = TableEnvironment.create(env_settings)
   
   # Get the current working directory
   CURRENT_DIR = os.getcwd()
   
   # Define a list of JAR file names you want to add
   jar_files = [
       "hudi-flink-bundle_2.12-0.10.1.jar",
       "flink-s3-fs-hadoop-1.16.1.jar",
       # "hudi-flink1.16-bundle-0.13.0.jar",
       "flink-sql-connector-kinesis-1.16.1.jar"
   ]
   
   # Build the list of JAR URLs by prepending 'file:///' to each file name
   jar_urls = [f"file:///{CURRENT_DIR}/{jar_file}" for jar_file in jar_files]
         
   table_env.get_config().get_configuration().set_string(
       "pipeline.jars",
       ";".join(jar_urls)
   )
   
   
   #hudi_output_path = 'file://' + os.path.join(os.getcwd(), 'output')
   hudi_output_path = 's3a://datateam-sandbox-qa-demo/tmp/'
   
   
   hudi_sink = f"""
   CREATE TABLE t1(
       uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
       name VARCHAR(10),
       `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
       'connector' = 'hudi',
       'path' = '{hudi_output_path}' ,
       'table.type' = 'MERGE_ON_READ' 
       
   );
   """
   
   
   table_env.execute_sql(hudi_sink)
   
   insert_into_hudi_sink_query= """
   INSERT INTO t1 VALUES
     ('id1','Danny','par1'),
     ('id2','Stephen','par1');
   """
   
   table_env.execute_sql(insert_into_hudi_sink_query)
   
   ```
   
   
   I see folder created on S3 but I don't see parquet files any idea why ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] soumilshah1995 commented on issue #9764: [SUPPORT] Apache Hudi and Flink with Python Running Locally

Reply via email to