[ 
https://issues.apache.org/jira/browse/PHOENIX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117549#comment-14117549
 ] 

James Taylor commented on PHOENIX-1227:
---------------------------------------

I agree, this shouldn't be allowed. There should be a check for 
MD5(v).getDataType().isCoercibleTo(v.getDataType()). If that fails, we should 
throw. You could put an explicit CAST in the query if need be.

> Upsert select of binary data doesn't always correctly coerce data into 
> correct format
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1227
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1227
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>
> If you run an upsert select statement that selects a binary value and writes 
> a numerical value (or probably other types as well), you can end up with 
> invalid binary values stored in HBase.
> For example, in something like this if v is an {{INTEGER}} column:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code}
> the literal 16-byte binary values from the MD5 function will be added 
> verbatim into the field v. 
> This is a really big problem if v is the key field, as it can even lead to 
> multiple keys with what appear to be the same value. This happens if there 
> are multiple (invalid) row keys that begin with the same 4 bytes, as only the 
> first 4 bytes of the key will be shown when selecting data from the column, 
> but the different full-length values of the row keys will lead to multiple 
> records.
> Somewhat related to this, a statement like the following (with a constant 
> binary value) will fail immediately due to datatype mismatch:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code}
> It seems that the first expression above should probably fail in the same way 
> as the expression with the constant binary value (or neither of them should 
> fail). Obviously there shouldn't be any invalid values being written in to 
> HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to