[
https://issues.apache.org/jira/browse/PHOENIX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715275#comment-14715275
]
Keren Gu commented on PHOENIX-2209:
-----------------------------------
This is from a different run, but also creating local indexes using IndexTool,
but it seems to be loading hfiles just fine. Then when I look into the index
table (LC_INDEX_SOJU_PROD_FN in this case), it is empty:
5/08/25 03:10:35 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_PROD_FN/_LOCAL_IDX_PH_SOJU_PROD/0/cf1b91d0c0b44de39abebfa1dc5762fb
first=\x00\x01Transaction_payment_type\x00\xB9X$:
last=\x00\x01extras_req_email_domainDomainMX_list_item_5\x00\xA0\xFD\x005
15/08/25 03:10:35 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_PROD_FN/_LOCAL_IDX_PH_SOJU_PROD/0/ff09590406c94a2f9f1952db44d7dc60
first=\x00\x01PerMinute_market_item_click\x00\xF1\x00\x89\x85
last=\x00\x01Transaction_payment_type\x00\xB9X"5
15/08/25 03:10:35 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_PROD_FN/_LOCAL_IDX_PH_SOJU_PROD/0/a7baca724668401b9cc592271ec4c241
first=\x00\x01EmailBillingNameMatch\x00\xDE\x9A&\x08
last=\x00\x01HasTxButNoPageView\x00\x9B.\xCB\xC9
15/08/25 03:10:35 INFO index.IndexTool: Removing output directory
LC_INDEX_SOJU_PROD_FN/_LOCAL_IDX_PH_SOJU_PROD
15/08/25 03:10:36 INFO index.IndexTool: Updated the status of the index
LC_INDEX_SOJU_PROD_FN to ACTIVE
15/08/25 03:10:36 INFO client.ConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x54f4da9c1bcfebf
15/08/25 03:10:36 INFO zookeeper.ZooKeeper: Session: 0x54f4da9c1bcfebf closed
15/08/25 03:10:36 INFO zookeeper.ClientCnxn: EventThread shut down
> Building Local Index Asynchronously via IndexTool fails to populate index
> table
> -------------------------------------------------------------------------------
>
> Key: PHOENIX-2209
> URL: https://issues.apache.org/jira/browse/PHOENIX-2209
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.0
> Environment: CDH: 5.4.4
> HBase: 1.0.0
> Phoenix: 4.5.0 (https://github.com/SiftScience/phoenix/tree/4.5-HBase-1.0)
> with hacks for CDH compatibility.
> Reporter: Keren Gu
> Labels: IndexTool, LocalIndex, index
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Using the Asynchronous Index population tool to create local index (of 1
> column) on tables with 10 columns, and 65M, 250M, 340M, and 1.3B rows
> respectively.
> Table Schema as follows (with generic column names):
> {quote}
> CREATE TABLE PH_SOJU_SHORT (
> id INT PRIMARY KEY,
> c2 VARCHAR NULL,
> c3 VARCHAR NULL,
> c4 VARCHAR NULL,
> c5 VARCHAR NULL,
> c6 VARCHAR NULL,
> c7 DOUBLE NULL,
> c8 VARCHAR NULL,
> c9 VARCHAR NULL,
> c10 BIGINT NULL
> )
> {quote}
> Example command used (for 65M row table):
> {quote}
> 0: jdbc:phoenix:localhost> create index LC_INDEX_SOJU_EVAL_FN on
> PH_SOJU_SHORT(C4) async;
> {quote}
> And MR job started with command:
> {quote}
> $ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table
> PH_SOJU_SHORT --index-table LC_INDEX_SOJU_EVAL_FN --output-path
> LC_INDEX_SOJU_EVAL_FN_HFILE
> {quote}
> The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min
> respectively, but all index tables where empty.
> For the table with 65M rows, IndexTool had 12 mappers and reducers. MR
> Counters show Map input and output records = 65M, Reduce Input and output
> records = 65M. PhoenixJobCounters input and output records are all 65M.
> IndexTool Reducer Log tail:
> {quote}
> ...
> 2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 32 segments left of total size: 22805636866 bytes
> 2015-08-25 00:26:44,693 INFO [main]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
> Committer Algorithm version is 1
> 2015-08-25 00:26:44,765 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> 2015-08-25 00:26:44,908 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2015-08-25 00:26:45,060 INFO [main]
> org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:36:43,880 INFO [main]
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
> Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
> wrote=10737418236
> 2015-08-25 00:36:45,967 INFO [main]
> org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1440094483400_5974_r_000000_0 is done. And is in the process of
> committing
> 2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task
> attempt_1440094483400_5974_r_000000_0 is allowed to commit now
> 2015-08-25 00:38:43,132 INFO [main]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
> task 'attempt_1440094483400_5974_r_000000_0' to
> hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
> 2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1440094483400_5974_r_000000_0' done.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)