Re: Reading CSV data from Stream

Toivo Adams Wed, 04 May 2016 10:07:56 -0700

Julian,

Thank You for Your suggestions.

I don't want to monitor file and read appended records.
Initially I want to read from in-memory stream.
Such a stream can be very, very large and doesn't fit in memory.

My idea is to create NiFi processor which uses SQL for data manipulation.
https://nifi.apache.org/

NiFi already contains large set of processors which filter, split, route,
etc. different data.
Data can be CSV, JSON, Avro, whatever.

Different processors use different parameters how data should be filtered,
split, routed, etc.
I think it would be nice to able to use SQL statement to specify how data
should be filtered, split, etc.
Because NiFi is able to use very big data sets (called FlowFile in Nifi),
streaming is as must.

I created very simple of POC how to use Stream instead of File.
I just created new modified versions of
src/main/java/org/apache/calcite/adapter/csv/CsvEnumerator2.java
src/main/java/org/apache/calcite/adapter/csv/CsvSchema2.java
src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory2.java
src/main/java/org/apache/calcite/adapter/csv/CsvTableScan2.java
src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable2.java

CALCITE-1227
describes little bit different use case.

I am ready to contribute back, but my Calcite knowledge is very limited.
So current POC is more like a hack and not good code.
Should I upload my current POC files to CALCITE-1227
or it is better to create another issue?

Thanks
toivo

2016-05-04 19:02 GMT+03:00 Julian Hyde <[email protected]>:

> I’ve logged https://issues.apache.org/jira/browse/CALCITE-1227 <
> https://issues.apache.org/jira/browse/CALCITE-1227>. Feel free to start
> implementing it!
>
> > On May 4, 2016, at 8:56 AM, Julian Hyde <[email protected]> wrote:
> >
> > It’s not straightforward to re-use a table adapter as a stream adapter.
> The reason is that one query might want to see the past (the current
> contents of the table) and another query might want to see the future (the
> stream of records added from this point on).
> >
> > I’m guessing that you want something like the CSV adapter that watches a
> file and reports records added to the end of the file (like the tail
> command[1]).
> >
> > You’d have to change CsvTable to implement StreamableTable, and
> implement the ‘Table stream()’ method to return a variant of the table that
> is in “follow” mode.
> >
> > It would probably be implemented by a variant of CsvEnumerator, but it
> is getting its input in bursts, as the file is appended to.
> >
> > Hope that helps.
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Tail_(Unix) <
> https://en.wikipedia.org/wiki/Tail_(Unix)>
> >
> >> On May 2, 2016, at 3:15 AM, Toivo Adams <[email protected] <mailto:
> [email protected]>> wrote:
> >>
> >> Hi,
> >>
> >> One possibility is to modify CsvEnumerator
> >> Opinions?
> >>
> >> Thanks
> >> Toivo
> >>
> >> 2016-05-01 18:35 GMT+03:00 Toivo Adams <[email protected] <mailto:
> [email protected]>>:
> >>
> >>> Hi,
> >>>
> >>> Please help newbie.
> >>> CSV works well reading files, but I want to read data from stream.
> >>> Data is not fixed length, may be endless stream.
> >>>
> >>> Any ideas how to accomplish this?
> >>> Should I try to modify CsvTranslatableTable?
> >>> Or should I take Cassandra adapter as example?
> >>>
> >>> Initially data will be CSV but later Avro is also good candidate.
> >>>
> >>> Thanks
> >>> Toivo
> >>>
> >
>
>

Re: Reading CSV data from Stream

Reply via email to