MonsterChenzhuo commented on issue #2808: URL: https://github.com/apache/incubator-seatunnel/issues/2808#issuecomment-1539849317
Regarding the definition of fragmentation parameters: scan.partition.column: Partition column name. This configuration item specifies the column name used for partitioning. Data will be split into multiple partitions based on the values of this column, enabling parallel reading. Typically, you should choose a column with a good data distribution and related to the query conditions as the partition column. scan.partition.num: Number of partitions. This configuration item specifies the number of partitions the data source should be split into. A larger value can increase parallelism, thereby improving read speed, but may also increase the demand for memory and computational resources. A smaller value may result in slower read speeds but lower resource usage. You can adjust this value according to actual requirements and available resources. scan.partition.lower-bound: Lower bound of the partition column. This configuration item specifies the minimum value of the partition column. It is used to define the range of data partitions, ensuring that all data in the data source with partition column values greater than or equal to this lower bound will be read. scan.partition.upper-bound: Upper bound of the partition column. This configuration item specifies the maximum value of the partition column. It is used to define the range of data partitions, ensuring that all data in the data source with partition column values less than or equal to this upper bound will be read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
