[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7571: Flink: Add retry limit for IcebergSource continuous split planning errors

via GitHub Wed, 10 May 2023 21:37:36 -0700


stevenzwu commented on code in PR #7571:
URL: https://github.com/apache/iceberg/pull/7571#discussion_r1190615363



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java:
##########
@@ -105,4 +105,8 @@ private FlinkReadOptions() {}
   public static final String LIMIT = "limit";
   public static final ConfigOption<Long> LIMIT_OPTION =
       ConfigOptions.key(PREFIX + LIMIT).longType().defaultValue(-1L);
+
+  public static final String PLAN_RETRY_NUM = "plan-retry-num";

Review Comment:
   suggested a diff name i the doc. applies to other variable or method



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java:
##########
@@ -105,4 +105,8 @@ private FlinkReadOptions() {}
   public static final String LIMIT = "limit";
   public static final ConfigOption<Long> LIMIT_OPTION =
       ConfigOptions.key(PREFIX + LIMIT).longType().defaultValue(-1L);
+
+  public static final String PLAN_RETRY_NUM = "plan-retry-num";

Review Comment:
   suggested a diff name i the doc



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/ContinuousIcebergEnumerator.java:
##########
@@ -115,7 +116,12 @@ private ContinuousEnumerationResult discoverSplits() {
       return new ContinuousEnumerationResult(
           Collections.emptyList(), enumeratorPosition.get(), 
enumeratorPosition.get());
     } else {
-      return splitPlanner.planSplits(enumeratorPosition.get());
+      AtomicReference<ContinuousEnumerationResult> result = new 
AtomicReference<>();
+      Tasks.foreach(enumeratorPosition.get())

Review Comment:
   I was thinking about more like max allowed consecutive planning failures 
(not retries), because planning is scheduled periodical already



##########
docs/flink-configuration.md:
##########
@@ -110,27 +110,28 @@ env.getConfig()
 
 `Read option` has the highest priority, followed by `Flink configuration` and 
then `Table property`.
 
-| Read option                 | Flink configuration                           
| Table property               | Default                          | Description 
                                                 |
-| --------------------------- | --------------------------------------------- 
| ---------------------------- | -------------------------------- | 
------------------------------------------------------------ |
-| snapshot-id                 | N/A                                           
| N/A                          | null                             | For time 
travel in batch mode. Read data from the specified snapshot-id. |
-| case-sensitive              | connector.iceberg.case-sensitive              
| N/A                          | false                            | If true, 
match column name in a case sensitive way.          |
-| as-of-timestamp             | N/A                                           
| N/A                          | null                             | For time 
travel in batch mode. Read data from the most recent snapshot as of the given 
time in milliseconds. |
+| Read option                 | Flink configuration                           
| Table property               | Default                          | Description 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                 |
+|-----------------------------|-----------------------------------------------|------------------------------|----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ----------------|
+| snapshot-id                 | N/A                                           
| N/A                          | null                             | For time 
travel in batch mode. Read data from the specified snapshot-id.                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                          
                 |
+| case-sensitive              | connector.iceberg.case-sensitive              
| N/A                          | false                            | If true, 
match column name in a case sensitive way.                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                          
                 |
+| as-of-timestamp             | N/A                                           
| N/A                          | null                             | For time 
travel in batch mode. Read data from the most recent snapshot as of the given 
time in milliseconds.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                 |
 | starting-strategy           | connector.iceberg.starting-strategy           
| N/A                          | INCREMENTAL_FROM_LATEST_SNAPSHOT | Starting 
strategy for streaming execution. TABLE_SCAN_THEN_INCREMENTAL: Do a regular 
table scan then switch to the incremental mode. The incremental mode starts 
from the current snapshot exclusive. INCREMENTAL_FROM_LATEST_SNAPSHOT: Start 
incremental mode from the latest snapshot inclusive. If it is an empty map, all 
future append snapshots should be discovered. 
INCREMENTAL_FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest 
snapshot inclusive. If it is an empty map, all future append snapshots should 
be discovered. INCREMENTAL_FROM_SNAPSHOT_ID: Start incremental mode from a 
snapshot with a specific id inclusive. INCREMENTAL_FROM_SNAPSHOT_TIMESTAMP: 
Start incremental mode from a snapshot with a specific timestamp inclusive. If 
the timestamp is between two snapshots, it should start from the snapshot after 
the timestamp. Just fo
 r FIP27 Source. |
-| start-snapshot-timestamp    | N/A                                           
| N/A                          | null                             | Start to 
read data from the most recent snapshot as of the given time in milliseconds. |
-| start-snapshot-id           | N/A                                           
| N/A                          | null                             | Start to 
read data from the specified snapshot-id.           |
-| end-snapshot-id             | N/A                                           
| N/A                          | The latest snapshot id           | Specifies 
the end snapshot.  
-| branch                     | N/A                                            
| N/A             | main       | Specifies the branch to read from in batch mode
-| tag                        | N/A                                            
| N/A             | null       | Specifies the tag to read from in batch mode
-| start-tag                  | N/A                                            
| N/A             | null       | Specifies the starting tag to read from for 
incremental reads
-| end-tag                    | N/A                                            
| N/A             | null       | Specifies the ending tag to to read from for 
incremental reads                                |
-| split-size                  | connector.iceberg.split-size                  
| read.split.target-size       | 128 MB                           | Target size 
when combining input splits.                     |
-| split-lookback              | connector.iceberg.split-file-open-cost        
| read.split.planning-lookback | 10                               | Number of 
bins to consider when combining input splits.      |
-| split-file-open-cost        | connector.iceberg.split-file-open-cost        
| read.split.open-file-cost    | 4MB                              | The 
estimated cost to open a file, used as a minimum weight when combining splits. |
-| streaming                   | connector.iceberg.streaming                   
| N/A                          | false                            | Sets 
whether the current task runs in streaming or batch mode. |
-| monitor-interval            | connector.iceberg.monitor-interval            
| N/A                          | 60s                              | Monitor 
interval to discover splits from new snapshots. Applicable only for streaming 
read. |
-| include-column-stats        | connector.iceberg.include-column-stats        
| N/A                          | false                            | Create a 
new scan from this that loads the column stats with each data file. Column 
stats include: value count, null value count, lower bounds, and upper bounds. |
-| max-planning-snapshot-count | connector.iceberg.max-planning-snapshot-count 
| N/A                          | Integer.MAX_VALUE                | Max number 
of snapshots limited per split enumeration. Applicable only to streaming read. |
-| limit                       | connector.iceberg.limit                       
| N/A                          | -1                               | Limited 
output number of rows.                               |
+| start-snapshot-timestamp    | N/A                                           
| N/A                          | null                             | Start to 
read data from the most recent snapshot as of the given time in milliseconds.   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                          
                 |
+| start-snapshot-id           | N/A                                           
| N/A                          | null                             | Start to 
read data from the specified snapshot-id.                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                          
                 |
+| end-snapshot-id             | N/A                                           
| N/A                          | The latest snapshot id           | Specifies 
the end snapshot.                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| branch                      | N/A                                           
| N/A                          | main                             | Specifies 
the branch to read from in batch mode                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| tag                         | N/A                                           
| N/A                          | null                             | Specifies 
the tag to read from in batch mode                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| start-tag                   | N/A                                           
| N/A                          | null                             | Specifies 
the starting tag to read from for incremental reads                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| end-tag                     | N/A                                           
| N/A                          | null                             | Specifies 
the ending tag to to read from for incremental reads                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| split-size                  | connector.iceberg.split-size                  
| read.split.target-size       | 128 MB                           | Target size 
when combining input splits.                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                 |
+| split-lookback              | connector.iceberg.split-file-open-cost        
| read.split.planning-lookback | 10                               | Number of 
bins to consider when combining input splits.                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                 |
+| split-file-open-cost        | connector.iceberg.split-file-open-cost        
| read.split.open-file-cost    | 4MB                              | The 
estimated cost to open a file, used as a minimum weight when combining splits.  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                               
                 |
+| streaming                   | connector.iceberg.streaming                   
| N/A                          | false                            | Sets 
whether the current task runs in streaming or batch mode.                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                 |
+| monitor-interval            | connector.iceberg.monitor-interval            
| N/A                          | 60s                              | Monitor 
interval to discover splits from new snapshots. Applicable only for streaming 
read.                                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                 |
+| include-column-stats        | connector.iceberg.include-column-stats        
| N/A                          | false                            | Create a 
new scan from this that loads the column stats with each data file. Column 
stats include: value count, null value count, lower bounds, and upper bounds.   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                               
                 |
+| max-planning-snapshot-count | connector.iceberg.max-planning-snapshot-count 
| N/A                          | Integer.MAX_VALUE                | Max number 
of snapshots limited per split enumeration. Applicable only to streaming read.  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                        
                 |
+| limit                       | connector.iceberg.limit                       
| N/A                          | -1                               | Limited 
output number of rows.                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                 |
+| plan-retry-num              | connector.iceberg.plan-retry-num              
| N/A                          | 3                                | Retries 
before job failure when there is an error during continuous planning. Set to -1 
to not fail on error.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                 |

Review Comment:
   `max-allowed-consecutive-planning-failures`?  I know it is a bit long, but 
probably more clear
   
   Regarding the description, `retries` is not accurate. how about the 
following?
   ```
   Max allowed consecutive failures for scan planning before failing the job. 
Set to -1 for never failing the job for scan planing failure.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7571: Flink: Add retry limit for IcebergSource continuous split planning errors

Reply via email to