[jira] [Commented] (METRON-1809) Support Column Oriented Input with Batch Profiler

ASF GitHub Bot (JIRA) Tue, 09 Oct 2018 08:36:22 -0700


    [ 
https://issues.apache.org/jira/browse/METRON-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643644#comment-16643644
 ]


ASF GitHub Bot commented on METRON-1809:
----------------------------------------

Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1229#discussion_r223754681
  
    --- Diff: metron-analytics/metron-profiler-spark/README.md ---
    @@ -265,6 +290,18 @@ The path to the input data read by the Batch Profiler.
     
     The format of the input data read by the Batch Profiler.
     
    +### `profiler.batch.input.reader`
    --- End diff --
    
    The main thing I'm getting at here is what happens if instead of 
    ```
    profiler.batch.input.reader=COLUMNAR
    profiler.batch.input.format=org.apache.spark.sql.execution.datasources.orc
    ```
    
    I say
    ```
    profiler.batch.input.reader=TEXT
    profiler.batch.input.format=org.apache.spark.sql.execution.datasources.orc
    ```
    
    Correct me if I'm wrong, but I believe it'll instantiate a 
`TextEncodedTelemetryReader` instead of a `ColumnEncodedTelemetryReader`, then 
fail to read the file.
    
    This is a super easy misconfiguration to make as it is right now.  Is it 
potentially reasonable to keep both fields, but let you shortcut known formats 
(e.g. ORC and Parquet)?  Or log a warning that a known misconfig happened and 
then proceed with the COLUMNAR option anyway?


> Support Column Oriented Input with Batch Profiler
> -------------------------------------------------
>
>                 Key: METRON-1809
>                 URL: https://issues.apache.org/jira/browse/METRON-1809
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> The Batch Profiler currently only accepts input formats that can be directly 
> serialized to JSON.  This should be enhanced to accept a wider variety of 
> input formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (METRON-1809) Support Column Oriented Input with Batch Profiler

Reply via email to