LadyForest commented on code in PR #167:
URL: https://github.com/apache/flink-table-store/pull/167#discussion_r902611941
##########
docs/content/docs/development/write-table.md:
##########
@@ -192,3 +192,95 @@ There are three main places in the Table Store's sink
writer that take up memory
- The memory consumed by compaction for reading files, it can be adjusted by
the
`num-sorted-run.compaction-trigger` option to change the maximum number of
files to be merged.
- The memory consumed by writing file, which is not adjustable.
+
+
+## Scale Bucket
+
+Since the LSM trees are built against each bucket, the number of total buckets
dramatically influences the performance.
+Table Store allows users to tune bucket numbers by `ALTER TABLE` command and
reorganize data layout by `INSERT OVERWRITE`
+without recreating the table/partition. When executing overwrite jobs, the
framework will automatically scan the data with
+the bucket number recorded in manifest file and hash the record according to
the current bucket numbers.
+
+#### Rescale Overwrite
+```sql
+-- scale number of total buckets
+ALTER TABLE table_dentifier SET ('bucket' = '...')
+
+-- reorganize data layout of table/partition
+INSERT OVERWRITE table_identifier [PARTITION (part_spec)]
+SELECT ...
+FROM table_identifier
+[WHERE part_spec]
+```
+
+Please beware that
+- `ALTER TABLE` only modifies the table's metadata and will **NOT** reorganize
or reformat existing data.
+ Reorganize exiting data must be achieved by `INSERT OVERWRITE`.
+- Scale bucket number does not influence the read and running write jobs.
+- Once the bucket number is changed, any new `INSERT INTO` jobs without
reorganize table/partition
Review Comment:
> You can say that reading is perfectly fine as long as this partition does
not continue to write data.
Well, I think the reading task is always fine because the read scan always
uses the manifest's bucket number?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]