[GitHub] nifi issue #2478: NIFI-4833 Add scanHBase Processor

bbende Mon, 05 Mar 2018 08:11:07 -0800

Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/2478
  
    One other question, what do you envision people most likely do with the 
output of this processor?
    
    The reason I'm asking is because I'm debating if it makes sense to write 
multiple JSON documents to a single flow file without wrapping them in an 
array. GetHBase and FetchHBase didn't have this problem because they wrote a 
row per flow file (which probably wasn't a good idea for GetHBase).
    
    As an example scenario, say we have a bunch of rows coming out of this 
processor using the col-qual-val format like:
    ```
    {"id":"", "message":"The time is Mon Mar 05 10:20:07 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:21:03 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    {"id":"", "message":"The time is Mon Mar 05 10:22:44 EST 2018"}
    ```
    
    If we then created a schema for this:
    ```
    {
      "name": "scan",
      "namespace": "nifi",
      "type": "record",
      "fields": [
        { "name": "id", "type": "string" },
        { "name": "message", "type": "string" }
      ]
    }
    ```
    Then tried to use ConvertRecord with a JsonTreeReader and 
CsvRecordSetWriter, to convert from JSON to CSV, we get:
    ```
    id,message
    "",The time is Mon Mar 05 10:20:07 EST 2018
    ```
    It only ends up converting the first JSON document because the 
JsonTreeReader doesn't know how to read multiple records unless its a JSON 
array.
    
    There may be cases where the current output makes sense so I'm not saying 
to change it yet, but just trying to think of what the most common scenario 
will be.

---

[GitHub] nifi issue #2478: NIFI-4833 Add scanHBase Processor

Reply via email to