zhangjun0x01 commented on issue #1914:
URL: https://github.com/apache/iceberg/issues/1914#issuecomment-747146918


   > @HeartSaVioR the reson why i need to use spark and flink is that
   > 
   > 1. flink is not support by partition by hour
   >    
![image](https://user-images.githubusercontent.com/5066512/102064596-765c9d00-3e32-11eb-8f7e-081d42afeadb.png)
   >    and my table is very huge .  that is not a good parctice
   > 2. flink sql is standalone .  my table is huge ,so  i need more resource 
to run sql   and choose spark
   > 
   > we use flink to write stream data to cos . it is of efficiency
   
   
   
   1. you can  create an iceberg table with flink sql
    ```
   CREATE TABLE iceberg.iceberg_db.iceberg_table (
     id BIGINT COMMENT 'unique id',
     data STRING,
     `day` int
     `hour` int) 
   PARTITIONED BY (`day`,`hour`)
   WITH ('connector'='iceberg','write.format.default'='orc')
   
   ```
   
   and write into the iceberg table with flink streaming sql :
   
   ```
   INSERT INTO iceberg.iceberg_db.iceberg_table 
   select id ,data,DAYOFMONTH(your_timestamp),hour(your_timestamp) from 
kafka_table
   ```
   
   
   2. flink sql client can use standalone cluster and yarn session cluster , 
you can start a yarn session cluster first ,and then submit the flink sql job 
to the session cluster .
   3. 
   and set the parallelism for every flink sql job by the command:
   ```
   set table.exec.resource.default-parallelism = 100
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to