Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1229#discussion_r223754681
  
    --- Diff: metron-analytics/metron-profiler-spark/README.md ---
    @@ -265,6 +290,18 @@ The path to the input data read by the Batch Profiler.
     
     The format of the input data read by the Batch Profiler.
     
    +### `profiler.batch.input.reader`
    --- End diff --
    
    The main thing I'm getting at here is what happens if instead of 
    ```
    profiler.batch.input.reader=COLUMNAR
    profiler.batch.input.format=org.apache.spark.sql.execution.datasources.orc
    ```
    
    I say
    ```
    profiler.batch.input.reader=TEXT
    profiler.batch.input.format=org.apache.spark.sql.execution.datasources.orc
    ```
    
    Correct me if I'm wrong, but I believe it'll instantiate a 
`TextEncodedTelemetryReader` instead of a `ColumnEncodedTelemetryReader`, then 
fail to read the file.
    
    This is a super easy misconfiguration to make as it is right now.  Is it 
potentially reasonable to keep both fields, but let you shortcut known formats 
(e.g. ORC and Parquet)?  Or log a warning that a known misconfig happened and 
then proceed with the COLUMNAR option anyway?


---

Reply via email to