Re: Reading CSV data from Stream

Julian Hyde Wed, 04 May 2016 10:29:34 -0700

Can you clarify what you mean by Stream? Do you mean java.util.Stream<String>?


The Csv adapter is first and foremost a file adapter. It might be easier to 
create a stream adapter and make it parse csv than start with a file adapter 
and make it handle streams.

> On May 4, 2016, at 10:07 AM, Toivo Adams <[email protected]> wrote:
> 
> Julian,
> 
> Thank You for Your suggestions.
> 
> I don't want to monitor file and read appended records.
> Initially I want to read from in-memory stream.
> Such a stream can be very, very large and doesn't fit in memory.
> 
> My idea is to create NiFi processor which uses SQL for data manipulation.
> https://nifi.apache.org/
> 
> NiFi already contains large set of processors which filter, split, route,
> etc. different data.
> Data can be CSV, JSON, Avro, whatever.
> 
> Different processors use different parameters how data should be filtered,
> split, routed, etc.
> I think it would be nice to able to use SQL statement to specify how data
> should be filtered, split, etc.
> Because NiFi is able to use very big data sets (called FlowFile in Nifi),
> streaming is as must.
> 
> I created very simple of POC how to use Stream instead of File.
> I just created new modified versions of
> src/main/java/org/apache/calcite/adapter/csv/CsvEnumerator2.java
> src/main/java/org/apache/calcite/adapter/csv/CsvSchema2.java
> src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory2.java
> src/main/java/org/apache/calcite/adapter/csv/CsvTableScan2.java
> src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable2.java
> 
> CALCITE-1227
> describes little bit different use case.
> 
> I am ready to contribute back, but my Calcite knowledge is very limited.
> So current POC is more like a hack and not good code.
> Should I upload my current POC files to CALCITE-1227
> or it is better to create another issue?
> 
> Thanks
> toivo
> 
> 
> 2016-05-04 19:02 GMT+03:00 Julian Hyde <[email protected]>:
> 
>> I’ve logged https://issues.apache.org/jira/browse/CALCITE-1227 <
>> https://issues.apache.org/jira/browse/CALCITE-1227>. Feel free to start
>> implementing it!
>> 
>>> On May 4, 2016, at 8:56 AM, Julian Hyde <[email protected]> wrote:
>>> 
>>> It’s not straightforward to re-use a table adapter as a stream adapter.
>> The reason is that one query might want to see the past (the current
>> contents of the table) and another query might want to see the future (the
>> stream of records added from this point on).
>>> 
>>> I’m guessing that you want something like the CSV adapter that watches a
>> file and reports records added to the end of the file (like the tail
>> command[1]).
>>> 
>>> You’d have to change CsvTable to implement StreamableTable, and
>> implement the ‘Table stream()’ method to return a variant of the table that
>> is in “follow” mode.
>>> 
>>> It would probably be implemented by a variant of CsvEnumerator, but it
>> is getting its input in bursts, as the file is appended to.
>>> 
>>> Hope that helps.
>>> 
>>> Julian
>>> 
>>> [1] https://en.wikipedia.org/wiki/Tail_(Unix) <
>> https://en.wikipedia.org/wiki/Tail_(Unix)>
>>> 
>>>> On May 2, 2016, at 3:15 AM, Toivo Adams <[email protected] <mailto:
>> [email protected]>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> One possibility is to modify CsvEnumerator
>>>> Opinions?
>>>> 
>>>> Thanks
>>>> Toivo
>>>> 
>>>> 2016-05-01 18:35 GMT+03:00 Toivo Adams <[email protected] <mailto:
>> [email protected]>>:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Please help newbie.
>>>>> CSV works well reading files, but I want to read data from stream.
>>>>> Data is not fixed length, may be endless stream.
>>>>> 
>>>>> Any ideas how to accomplish this?
>>>>> Should I try to modify CsvTranslatableTable?
>>>>> Or should I take Cassandra adapter as example?
>>>>> 
>>>>> Initially data will be CSV but later Avro is also good candidate.
>>>>> 
>>>>> Thanks
>>>>> Toivo
>>>>> 
>>> 
>> 
>>

Re: Reading CSV data from Stream

Reply via email to