JingsongLi commented on code in PR #167: URL: https://github.com/apache/flink-table-store/pull/167#discussion_r903568556
########## docs/content/docs/development/rescale-bucket.md: ########## @@ -0,0 +1,141 @@ +--- +title: "Rescale Bucket" +weight: 5 +type: docs +aliases: +- /development/rescale-bucket.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Rescale Bucket + +Since the number of total buckets dramatically influences the performance, Table Store allows users to +tune bucket numbers by `ALTER TABLE` command and reorganize data layout by `INSERT OVERWRITE` +without recreating the table/partition. When executing overwrite jobs, the framework will automatically +scan the data with the old bucket number and hash the record according to the current bucket number. + +## Rescale Overwrite +```sql +-- scale number of total buckets Review Comment: rescale ########## docs/content/docs/development/rescale-bucket.md: ########## @@ -0,0 +1,141 @@ +--- +title: "Rescale Bucket" +weight: 5 +type: docs +aliases: +- /development/rescale-bucket.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Rescale Bucket + +Since the number of total buckets dramatically influences the performance, Table Store allows users to +tune bucket numbers by `ALTER TABLE` command and reorganize data layout by `INSERT OVERWRITE` +without recreating the table/partition. When executing overwrite jobs, the framework will automatically +scan the data with the old bucket number and hash the record according to the current bucket number. + +## Rescale Overwrite +```sql +-- scale number of total buckets +ALTER TABLE table_identifier SET ('bucket' = '...') + +-- reorganize data layout of table/partition +INSERT OVERWRITE table_identifier [PARTITION (part_spec)] +SELECT ... +FROM table_identifier +[WHERE part_spec] +``` + +Please note that +- `ALTER TABLE` only modifies the table's metadata and will **NOT** reorganize or reformat existing data. + Reorganize exiting data must be achieved by `INSERT OVERWRITE`. +- Scale bucket number does not influence the read and running write jobs. Review Comment: Rescale ########## docs/content/docs/development/rescale-bucket.md: ########## @@ -0,0 +1,141 @@ +--- +title: "Rescale Bucket" +weight: 5 +type: docs +aliases: +- /development/rescale-bucket.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Rescale Bucket + +Since the number of total buckets dramatically influences the performance, Table Store allows users to +tune bucket numbers by `ALTER TABLE` command and reorganize data layout by `INSERT OVERWRITE` +without recreating the table/partition. When executing overwrite jobs, the framework will automatically +scan the data with the old bucket number and hash the record according to the current bucket number. + +## Rescale Overwrite +```sql +-- scale number of total buckets +ALTER TABLE table_identifier SET ('bucket' = '...') + +-- reorganize data layout of table/partition +INSERT OVERWRITE table_identifier [PARTITION (part_spec)] +SELECT ... +FROM table_identifier +[WHERE part_spec] +``` + +Please note that +- `ALTER TABLE` only modifies the table's metadata and will **NOT** reorganize or reformat existing data. + Reorganize exiting data must be achieved by `INSERT OVERWRITE`. +- Scale bucket number does not influence the read and running write jobs. +- Once the bucket number is changed, any newly scheduled `INSERT INTO` jobs without reorganize existing table/partition + will throw a `TableException` with message like + ```text + Try to write table/partition ... with a new bucket num ..., + but the previous bucket num is ... Please switch to batch mode, + and perform INSERT OVERWRITE to rescale current data layout first. + ``` +- For partitioned table, it is possible to have different bucket number for different partitions. *E.g.* + ```sql + ALTER TABLE my_table SET ('bucket' = '4'); + INSERT OVERWRITE my_table PARTITION (dt = '2022-01-01') + SELECT * FROM ...; + + ALTER TABLE my_table SET ('bucket' = '8'); + INSERT OVERWRITE my_table PARTITION (dt = '2022-01-02') + SELECT * FROM ...; + ``` +- During overwrite period, make sure there are no other jobs writing the same table/partition. + +{{< hint info >}} +__Note:__ For the table which enables log system(*e.g.* Kafka), please scale the topic's partition as well to keep consistency. Review Comment: rescale ########## docs/content/docs/development/rescale-bucket.md: ########## @@ -0,0 +1,141 @@ +--- +title: "Rescale Bucket" +weight: 5 +type: docs +aliases: +- /development/rescale-bucket.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Rescale Bucket + +Since the number of total buckets dramatically influences the performance, Table Store allows users to +tune bucket numbers by `ALTER TABLE` command and reorganize data layout by `INSERT OVERWRITE` +without recreating the table/partition. When executing overwrite jobs, the framework will automatically +scan the data with the old bucket number and hash the record according to the current bucket number. + +## Rescale Overwrite +```sql +-- scale number of total buckets +ALTER TABLE table_identifier SET ('bucket' = '...') + +-- reorganize data layout of table/partition +INSERT OVERWRITE table_identifier [PARTITION (part_spec)] +SELECT ... +FROM table_identifier +[WHERE part_spec] +``` + +Please note that +- `ALTER TABLE` only modifies the table's metadata and will **NOT** reorganize or reformat existing data. + Reorganize exiting data must be achieved by `INSERT OVERWRITE`. +- Scale bucket number does not influence the read and running write jobs. +- Once the bucket number is changed, any newly scheduled `INSERT INTO` jobs without reorganize existing table/partition Review Comment: `without reorganize` -> `without reorganized`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
