[
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778887#comment-17778887
]
James Udiljak commented on HBASE-28174:
---------------------------------------
Hi [~wchevreuil], I think that documentation is referring to fields sent in the
request _body_ (when using JSON/XML). I can see no base64 decoding in [the
source of the function that decodes the URL
fragments|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100].
Indeed, if I try a DELETE request on {{/<b64_table_name>/<b64_row_key>}} it
returns a 404 response quoting the raw base64 text and stating that the table
cannot be found:
!delete_base64_1.png|width=1084,height=275!
If I try again with {{/<raw_table_name>/<b64_row_key>}} then I receive a 200
response, but the delete marker that is placed uses the raw base64-encoded
value as its row key; it does not decode the base64 text and the row that I
intend to delete is not considered deleted:
{{
hbase:001:0> scan 'rest_test'
ROW COLUMN+CELL
test_row_key column=cf:4,
timestamp=2023-10-19T02:11:10.576, value=w
test_row_key column=cf:5,
timestamp=2023-10-19T02:11:10.576, value=v
1 row(s)
Took 0.6286 seconds
hbase:002:0>
curl --request DELETE --url http://10.123.81.34:8070/rest_test/dGVzdF9yb3dfa2V5
hbase:002:0> scan 'rest_test'
ROW COLUMN+CELL
test_row_key column=cf:4,
timestamp=2023-10-19T02:11:10.576, value=w
test_row_key column=cf:5,
timestamp=2023-10-19T02:11:10.576, value=v
1 row(s)
Took 0.0061 seconds
hbase:003:0> scan 'rest_test', RAW => true
ROW COLUMN+CELL
dGVzdF9yb3dfa2V5 column=cf:,
timestamp=2023-10-24T01:42:54.554, type=DeleteFamily
test_row_key column=cf:4,
timestamp=2023-10-19T02:11:10.576, value=w
test_row_key column=cf:5,
timestamp=2023-10-19T02:11:10.576, value=v
2 row(s)
Took 0.0108 seconds
hbase:004:0>
}}
> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -----------------------------------------------------------------------------
>
> Key: HBASE-28174
> URL: https://issues.apache.org/jira/browse/HBASE-28174
> Project: HBase
> Issue Type: Bug
> Components: REST
> Affects Versions: 2.4.17
> Reporter: James Udiljak
> Priority: Blocker
> Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me
> know if there's anything I need to adjust on the issue to fit in with your
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a
> user of the HBase REST API from deploying an effective solution for our
> setup. Please feel free to change this if the Priority field has another
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version
> I am running, however looking at the source code on GitHub in the default
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
> requires specifying row keys and column families/offsets in the URI (i.e. as
> UTF-8 text). This makes it impossible to specify a delete operation via the
> REST API for a binary row key or column family/offset, as single bytes with a
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString,
> "UTF-8")}} function, which replaces any percent-encoded byte in the range
> {{%80}} to {{%FF}} with the [replacement
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
> Even if this were not the case, the row-key is ultimately [converted to a
> byte
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
> using UTF-8 encoding, wherein code points >127 are encoded across multiple
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the
> URL for the DELETE endpoint without breaking compatibility for any users who
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row
> keys and column families/offsets in the request _body_ (using Base64 encoding
> for the JSON and XML formats, and bare binary for protobuf). This new
> endpoint would follow the same conventions as the PUT operations, except that
> cell values would not need to be specified (unless the user is performing a
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for
> deleting multiple rows in a single request, which would drastically improve
> the efficiency of my use case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)