[ 
https://issues.apache.org/jira/browse/PHOENIX-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617036#comment-16617036
 ] 

JeongMin Ju commented on PHOENIX-4872:
--------------------------------------

you seem to be misunderstanding the core.

The key here is that bulk loading will result in missing necessary marker 
columns in Phoenix.

You have to check the actual data using the HBase shell after bulk loading.

You will notice that there is no column 0: \ x00 \ x00 \ x00 \ x00.

This is clearly different from when you use the upsert statement.

Try using 'select count (1) from table' instead of group by query in your test 
case.

In a group by query, scan is generated as follows and processed normally.

"families":\{"0":["\\x00\\x00\\x00\\x00"],"1":["ALL"]}

 

> BulkLoad has bug when loading on single-cell-array-with-offsets table.
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-4872
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4872
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0, 4.12.0, 4.13.0, 4.14.0
>            Reporter: JeongMin Ju
>            Assignee: Swaroopa Kadam
>            Priority: Critical
>
> CsvBulkLoadTool creates incorrect data for the 
> SCAWO(SingleCellArrayWithOffsets) table.
> Every phoenix table needs a marker (empty) column, but CsvBulkLoadTool does 
> not create that column for SCAWO tables.
> If you check the data through HBase Shell, you can see that there is no 
> corresponding column.
>  If created by Upsert Query, it is created normally.
> {code:java}
> column=0:\x00\x00\x00\x00, timestamp=1535420036372, value=x
> {code}
> Since there is no upper column, the result of all Group By queries is zero.
> This is because "families":
> {"0": ["\\ x00 \\ x00 \\ x00 \\ x00"]}
> is added to the column of the Scan object.
> Because the CsvBulkLoadTool has not created the column, the result of the 
> scan is empty.
>  
> This problem applies only to tables with multiple column families. The 
> single-column family table works luckily.
> "Families": \{"0": ["ALL"]} is added to the column of the Scan object in the 
> single column family table. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to