[
https://issues.apache.org/jira/browse/PHOENIX-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842749#comment-17842749
]
Viraj Jasani commented on PHOENIX-7309:
---------------------------------------
{quote}HBase provides API to create a table with split points text file.
{quote}
HBase only provides this as shell capability, there is no public Admin API that
allows clients to provide split points file as input while creating the table.
This is what shell does when SPLITS_FILE is provided:
{code:java}
if arg.key?(SPLITS_FILE)
splits_file = arg.delete(SPLITS_FILE)
unless File.exist?(splits_file)
raise(ArgumentError, "Splits file #{splits_file} doesn't exist")
end
arg[SPLITS] = []
File.foreach(splits_file) do |line|
arg[SPLITS].push(line.chomp)
end
tdb.setValue(SPLITS_FILE, splits_file)
end
if arg.key?(SPLITS)
splits = Java::byte[][arg[SPLITS].size].new
idx = 0
arg.delete(SPLITS).each do |split|
splits[idx] = org.apache.hadoop.hbase.util.Bytes.toBytesBinary(split)
idx += 1
end
elsif arg.key?(NUMREGIONS) || arg.key?(SPLITALGO)
...
...
...
if splits.nil?
# Perform the create table call
@admin.createTable(tdb.build)
else
# Perform the create table call
@admin.createTable(tdb.build, splits)
end{code}
Here, shell creates an array of byte[] split keys and use public admin API:
{code:java}
/**
* Creates a new table with an initial set of empty regions defined by the
specified split keys.
* The total number of regions created will be the number of split keys plus
one. Synchronous
* operation. Note : Avoid passing empty split key.
* @param desc table descriptor for table
* @param splitKeys array of split keys for the initial regions of the table
* @throws IllegalArgumentException if the table name
is reserved, if the
* split keys are
repeated and if the
* split key has
empty byte array.
* @throws org.apache.hadoop.hbase.MasterNotRunningException if master is not
running
* @throws TableExistsException if table already
exists (If
* concurrent
threads, the table may
* have been created
between
* test-for-existence
and
*
attempt-at-creation).
* @throws IOException if a remote or
network exception
* occurs
*/
default void createTable(TableDescriptor desc, byte[][] splitKeys) throws
IOException {
get(createTableAsync(desc, splitKeys), getSyncWaitTimeout(),
TimeUnit.MILLISECONDS);
} {code}
While Phoenix can do something similar to what HBase shell does, I believe
rather than Phoenix having to read the whole split file (with potentially 10k
or 50k worth of split keys) and create split keys array, it would be great if
HBase can provide public Admin API with input as split file. This has
advantages of building any file content specific validations in HBase Admin
directly, and HBase Admin can take care of reading the file contents and
creating split key array, without worrying about file size.
> Support specifying splits.txt file while creating a table.
> ----------------------------------------------------------
>
> Key: PHOENIX-7309
> URL: https://issues.apache.org/jira/browse/PHOENIX-7309
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Rushabh Shah
> Priority: Major
>
> Currently phoenix grammar support specifying splits points while creating a
> table.
> See grammar [here|https://phoenix.apache.org/language/index.html#create_table]
> {noformat}
> CREATE TABLE IF NOT EXISTS "my_case_sensitive_table"
> ( "id" char(10) not null primary key, "value" integer)
> DATA_BLOCK_ENCODING='NONE',VERSIONS=5,MAX_FILESIZE=2000000 split on (?,
> ?, ?)
> {noformat}
> This works fine if you have few split points (less than 10-20).
> But if you want to specify 1000 (or in 10,000s) split points then this API
> becomes very cumbersome to use.
> HBase provides API to create a table with split points text file.
> {noformat}
> hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
> {noformat}
> We should also have support in Phoenix to provide split points in a text file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)