Github user nickwallen commented on a diff in the pull request:
https://github.com/apache/metron/pull/1229#discussion_r223743370
--- Diff: metron-analytics/metron-profiler-spark/README.md ---
@@ -265,6 +290,18 @@ The path to the input data read by the Batch Profiler.
The format of the input data read by the Batch Profiler.
+### `profiler.batch.input.reader`
--- End diff --
It is a valid option. The only reason I did not do that is that we would
have to specifically support each format like JSON, CSV, ORC, Parquet. Whereas
with these two switches, via configuration alone, a user can use a variety of
formats without us having to specifically support each one.
Now that being said, I don't know how useful that is to the user
population. How many formats will users want to consume? How useful is that
flexibility?
At this point, since this is new functionality, I decided to err on the
side of greater flexibility over simplicity. Knowing that reasoning, let me
know if you still think we should go for simplicity over flexibility.
---