[
https://issues.apache.org/jira/browse/PHOENIX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keren Gu updated PHOENIX-2209:
------------------------------
Description:
Using the Asynchronous Index population tool to create local index (of 1
column) on tables with 10 columns, and 65M, 250M, 340M, and 1.3B rows
respectively.
Table Schema as follows (with generic column names):
{quote}
CREATE TABLE PH_SOJU_SHORT (
id INT PRIMARY KEY,
c2 VARCHAR NULL,
c3 VARCHAR NULL,
c4 VARCHAR NULL,
c5 VARCHAR NULL,
c6 VARCHAR NULL,
c7 DOUBLE NULL,
c8 VARCHAR NULL,
c9 VARCHAR NULL,
c10 BIGINT NULL
)
{quote}
Example command used (for 65M row table):
{quote}
0: jdbc:phoenix:localhost> create local index LC_INDEX_SOJU_EVAL_FN on
PH_SOJU_SHORT(C4) async;
{quote}
And MR job started with command:
{quote}
$ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table PH_SOJU_SHORT
--index-table LC_INDEX_SOJU_EVAL_FN --output-path LC_INDEX_SOJU_EVAL_FN_HFILE
{quote}
The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min
respectively, but all index tables where empty.
For the table with 65M rows, IndexTool had 12 mappers and reducers. MR Counters
show Map input and output records = 65M, Reduce Input and output records = 65M.
PhoenixJobCounters input and output records are all 65M.
IndexTool Reducer Log tail:
{quote}
...
2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 32 segments left of total size: 22805636866 bytes
2015-08-25 00:26:44,693 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
Committer Algorithm version is 1
2015-08-25 00:26:44,765 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
2015-08-25 00:26:44,908 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated.
Instead, use mapreduce.job.skiprecords
2015-08-25 00:26:45,060 INFO [main]
org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
2015-08-25 00:36:43,880 INFO [main]
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
wrote=10737418236
2015-08-25 00:36:45,967 INFO [main]
org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1440094483400_5974_r_000000_0 is done. And is in the process of
committing
2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task
attempt_1440094483400_5974_r_000000_0 is allowed to commit now
2015-08-25 00:38:43,132 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_1440094483400_5974_r_000000_0' to
hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1440094483400_5974_r_000000_0' done.
{quote}
was:
Using the Asynchronous Index population tool to create local index (of 1
column) on tables with 10 columns, and 65M, 250M, 340M, and 1.3B rows
respectively.
Table Schema as follows (with generic column names):
{quote}
CREATE TABLE PH_SOJU_SHORT (
id INT PRIMARY KEY,
c2 VARCHAR NULL,
c3 VARCHAR NULL,
c4 VARCHAR NULL,
c5 VARCHAR NULL,
c6 VARCHAR NULL,
c7 DOUBLE NULL,
c8 VARCHAR NULL,
c9 VARCHAR NULL,
c10 BIGINT NULL
)
{quote}
Example command used (for 65M row table):
{quote}
0: jdbc:phoenix:localhost> create index LC_INDEX_SOJU_EVAL_FN on
PH_SOJU_SHORT(C4) async;
{quote}
And MR job started with command:
{quote}
$ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table PH_SOJU_SHORT
--index-table LC_INDEX_SOJU_EVAL_FN --output-path LC_INDEX_SOJU_EVAL_FN_HFILE
{quote}
The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min
respectively, but all index tables where empty.
For the table with 65M rows, IndexTool had 12 mappers and reducers. MR Counters
show Map input and output records = 65M, Reduce Input and output records = 65M.
PhoenixJobCounters input and output records are all 65M.
IndexTool Reducer Log tail:
{quote}
...
2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 32 segments left of total size: 22805636866 bytes
2015-08-25 00:26:44,693 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
Committer Algorithm version is 1
2015-08-25 00:26:44,765 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
2015-08-25 00:26:44,908 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated.
Instead, use mapreduce.job.skiprecords
2015-08-25 00:26:45,060 INFO [main]
org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
2015-08-25 00:36:43,880 INFO [main]
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
wrote=10737418236
2015-08-25 00:36:45,967 INFO [main]
org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1440094483400_5974_r_000000_0 is done. And is in the process of
committing
2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task
attempt_1440094483400_5974_r_000000_0 is allowed to commit now
2015-08-25 00:38:43,132 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_1440094483400_5974_r_000000_0' to
hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1440094483400_5974_r_000000_0' done.
{quote}
> Building Local Index Asynchronously via IndexTool fails to populate index
> table
> -------------------------------------------------------------------------------
>
> Key: PHOENIX-2209
> URL: https://issues.apache.org/jira/browse/PHOENIX-2209
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.0
> Environment: CDH: 5.4.4
> HBase: 1.0.0
> Phoenix: 4.5.0 (https://github.com/SiftScience/phoenix/tree/4.5-HBase-1.0)
> with hacks for CDH compatibility.
> Reporter: Keren Gu
> Labels: IndexTool, LocalIndex, index
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Using the Asynchronous Index population tool to create local index (of 1
> column) on tables with 10 columns, and 65M, 250M, 340M, and 1.3B rows
> respectively.
> Table Schema as follows (with generic column names):
> {quote}
> CREATE TABLE PH_SOJU_SHORT (
> id INT PRIMARY KEY,
> c2 VARCHAR NULL,
> c3 VARCHAR NULL,
> c4 VARCHAR NULL,
> c5 VARCHAR NULL,
> c6 VARCHAR NULL,
> c7 DOUBLE NULL,
> c8 VARCHAR NULL,
> c9 VARCHAR NULL,
> c10 BIGINT NULL
> )
> {quote}
> Example command used (for 65M row table):
> {quote}
> 0: jdbc:phoenix:localhost> create local index LC_INDEX_SOJU_EVAL_FN on
> PH_SOJU_SHORT(C4) async;
> {quote}
> And MR job started with command:
> {quote}
> $ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table
> PH_SOJU_SHORT --index-table LC_INDEX_SOJU_EVAL_FN --output-path
> LC_INDEX_SOJU_EVAL_FN_HFILE
> {quote}
> The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min
> respectively, but all index tables where empty.
> For the table with 65M rows, IndexTool had 12 mappers and reducers. MR
> Counters show Map input and output records = 65M, Reduce Input and output
> records = 65M. PhoenixJobCounters input and output records are all 65M.
> IndexTool Reducer Log tail:
> {quote}
> ...
> 2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 32 segments left of total size: 22805636866 bytes
> 2015-08-25 00:26:44,693 INFO [main]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
> Committer Algorithm version is 1
> 2015-08-25 00:26:44,765 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> 2015-08-25 00:26:44,908 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2015-08-25 00:26:45,060 INFO [main]
> org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:36:43,880 INFO [main]
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
> Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
> wrote=10737418236
> 2015-08-25 00:36:45,967 INFO [main]
> org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1440094483400_5974_r_000000_0 is done. And is in the process of
> committing
> 2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task
> attempt_1440094483400_5974_r_000000_0 is allowed to commit now
> 2015-08-25 00:38:43,132 INFO [main]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
> task 'attempt_1440094483400_5974_r_000000_0' to
> hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
> 2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1440094483400_5974_r_000000_0' done.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)