stack created HBASE-16425:
-----------------------------
Summary: [Operability] Autohandling 'bad data'
Key: HBASE-16425
URL: https://issues.apache.org/jira/browse/HBASE-16425
Project: HBase
Issue Type: Brainstorming
Components: Operability
Reporter: stack
This is a brainstorming issue. It came up chatting w/ a couple of operators
talking about 'bad data'; i.e. no matter how you control your clients, someone
by mistake or under a misconception will load an out-of-spec Cell or Row. In
this particular case, two types of 'bad data' were talked about:
(on) The Big Cell: An upload of a 'big cell' came in via bulkload but it so
happened that their frontend all arrived at the malignant Cell at the same time
so hundreds of threads requesting the big cell. The RS OOME'd. Then when the
region opened on the new RS, it OOME'd, etc. Could we switch to chunking when a
Server sees that it has a large Cell on its hands? I suppose bulk load could
defeat any Put chunking we had in place but would be good to have this too.
Chatting w/ Matteo, we probably want to just move to the streaming Interface
that we've talked of in the past at various times; the Get would chunk out the
big Cell for assembly on the Client, or just give back the Cell in pieces -- an
OutputStream for the Application to suck on. New API and/or old API could use
it when Cells are big.
(on) The user had a row with 29M Columns in it because the default entity had
id=-1.... In this case chunking the Scan (v1.1+) helps but the operator was
having trouble finding the problem row. How could we surface anomalies like
this for operators? On flush, add even more meta data to the HFile (Yahoo! Data
Sketches as [~jleach] has been suggesting) and then an offline tool to read
metadata and run it through a few simple rules. Data Sketches are mergeable so
could build up a region-view or store-view....
This is sketchy and I'm pretty sure repeats stuff in old issues but parking
this note here while the encounter still fresh.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)