It sounds to me as if you will be able to get something half-baked on InputStream working easily, but it won't support parallelism, distribution, reliability, fail-over, and replay.
I don't have a problem with that. You can write your application now and re-deploy on more a robust streaming SQL implementation (Samza, Flink, Storm etc.) later. But be sure that you are using the "stream" keyword in your queries, e.g. select stream * from Orders where units > 10 I have a feeling that you have faked out Calcite to believe that your InputStream is the Orders *table*, whereas you should be making Calcite believe that your InputStrema is the Orders *stream*, to that if you ever were to write select count(*) from Orders Calcite would be able to go somewhere else for the historical table. Julian On Wed, May 4, 2016 at 10:44 AM, Toivo Adams <[email protected]> wrote: > Its > java.io.InputStream > > Actually it was not very difficult to modify CSV adapter to accept stream > instead of file. > But of course, current POC may have some problems which are not shown up > yet. > > CSVReader accept stream reader also. > > > > 2016-05-04 20:28 GMT+03:00 Julian Hyde <[email protected]>: > >> Can you clarify what you mean by Stream? Do you mean >> java.util.Stream<String>? >> >> The Csv adapter is first and foremost a file adapter. It might be easier >> to create a stream adapter and make it parse csv than start with a file >> adapter and make it handle streams. >> >> > On May 4, 2016, at 10:07 AM, Toivo Adams <[email protected]> wrote: >> > >> > Julian, >> > >> > Thank You for Your suggestions. >> > >> > I don't want to monitor file and read appended records. >> > Initially I want to read from in-memory stream. >> > Such a stream can be very, very large and doesn't fit in memory. >> > >> > My idea is to create NiFi processor which uses SQL for data manipulation. >> > https://nifi.apache.org/ >> > >> > NiFi already contains large set of processors which filter, split, route, >> > etc. different data. >> > Data can be CSV, JSON, Avro, whatever. >> > >> > Different processors use different parameters how data should be >> filtered, >> > split, routed, etc. >> > I think it would be nice to able to use SQL statement to specify how data >> > should be filtered, split, etc. >> > Because NiFi is able to use very big data sets (called FlowFile in Nifi), >> > streaming is as must. >> > >> > I created very simple of POC how to use Stream instead of File. >> > I just created new modified versions of >> > src/main/java/org/apache/calcite/adapter/csv/CsvEnumerator2.java >> > src/main/java/org/apache/calcite/adapter/csv/CsvSchema2.java >> > src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory2.java >> > src/main/java/org/apache/calcite/adapter/csv/CsvTableScan2.java >> > src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable2.java >> > >> > CALCITE-1227 >> > describes little bit different use case. >> > >> > I am ready to contribute back, but my Calcite knowledge is very limited. >> > So current POC is more like a hack and not good code. >> > Should I upload my current POC files to CALCITE-1227 >> > or it is better to create another issue? >> > >> > Thanks >> > toivo >> > >> > >> > 2016-05-04 19:02 GMT+03:00 Julian Hyde <[email protected]>: >> > >> >> I’ve logged https://issues.apache.org/jira/browse/CALCITE-1227 < >> >> https://issues.apache.org/jira/browse/CALCITE-1227>. Feel free to start >> >> implementing it! >> >> >> >>> On May 4, 2016, at 8:56 AM, Julian Hyde <[email protected]> wrote: >> >>> >> >>> It’s not straightforward to re-use a table adapter as a stream adapter. >> >> The reason is that one query might want to see the past (the current >> >> contents of the table) and another query might want to see the future >> (the >> >> stream of records added from this point on). >> >>> >> >>> I’m guessing that you want something like the CSV adapter that watches >> a >> >> file and reports records added to the end of the file (like the tail >> >> command[1]). >> >>> >> >>> You’d have to change CsvTable to implement StreamableTable, and >> >> implement the ‘Table stream()’ method to return a variant of the table >> that >> >> is in “follow” mode. >> >>> >> >>> It would probably be implemented by a variant of CsvEnumerator, but it >> >> is getting its input in bursts, as the file is appended to. >> >>> >> >>> Hope that helps. >> >>> >> >>> Julian >> >>> >> >>> [1] https://en.wikipedia.org/wiki/Tail_(Unix) < >> >> https://en.wikipedia.org/wiki/Tail_(Unix)> >> >>> >> >>>> On May 2, 2016, at 3:15 AM, Toivo Adams <[email protected] >> <mailto: >> >> [email protected]>> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> One possibility is to modify CsvEnumerator >> >>>> Opinions? >> >>>> >> >>>> Thanks >> >>>> Toivo >> >>>> >> >>>> 2016-05-01 18:35 GMT+03:00 Toivo Adams <[email protected] >> <mailto: >> >> [email protected]>>: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> Please help newbie. >> >>>>> CSV works well reading files, but I want to read data from stream. >> >>>>> Data is not fixed length, may be endless stream. >> >>>>> >> >>>>> Any ideas how to accomplish this? >> >>>>> Should I try to modify CsvTranslatableTable? >> >>>>> Or should I take Cassandra adapter as example? >> >>>>> >> >>>>> Initially data will be CSV but later Avro is also good candidate. >> >>>>> >> >>>>> Thanks >> >>>>> Toivo >> >>>>> >> >>> >> >> >> >> >> >>
