Github user nickwallen commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1229#discussion_r223743370
  
    --- Diff: metron-analytics/metron-profiler-spark/README.md ---
    @@ -265,6 +290,18 @@ The path to the input data read by the Batch Profiler.
     
     The format of the input data read by the Batch Profiler.
     
    +### `profiler.batch.input.reader`
    --- End diff --
    
    It is a valid option.  The only reason I did not do that is that we would 
have to specifically support each format like JSON, CSV, ORC, Parquet.  Whereas 
with these two switches, via configuration alone, a user can use a variety of 
formats without us having to specifically support each one.
    
    Now that being said, I don't know how useful that is to the user 
population.  How many formats will users want to consume?  How useful is that 
flexibility? 
    
    At this point, since this is new functionality, I decided to err on the 
side of greater flexibility over simplicity. Knowing that reasoning, let me 
know if you still think we should go for simplicity over flexibility.



---

Reply via email to