Forking this thread to a new topic, it has strayed quite a bit from the original discussion I think which was for server side Java implementations.
On Tue, Mar 15, 2022 at 9:14 AM James Duong <jam...@bitquilltech.com.invalid> wrote: > I could also see extensions to ODBC/JDBC being a point of confusion for app > developers too. > > For example, if we were to add hooks in the JDBC driver to report endpoints > so that > applications can call getStream() directly, what would happen if the user > started getting > a stream then went back and tried to use the regular ResultSet interface? A > stream > would be consumed, but the driver wouldn't know it. > > On Tue, Mar 15, 2022 at 9:07 AM Kyle Porter <ky...@bitquilltech.com > .invalid> > wrote: > > > In general, I have problems with attempting to expose other extensions > > through existing standards such as ODBC/JDBC. What it feels like we're > > saying is: use the standard so you don't have to change any code, except > > for this part where you must write custom code to take advantage of the > > non-standard portions. > > > > At that point, why not just write something fully custom and take > advantage > > of the underlying interface? > > > > The higher level clients are meant to ease adoption and may be all that > > existing applications use, but new applications can have a choice to use > > the higher level clients or the lower level interface. > > > > *Kyle Porter* > > CEO > > Bit Quill Technologies Inc. > > Office: +1.778.331.3355 | Direct: +1.604.441.7318 | > ky...@bitquilltech.com > > https://www.bitquill.com > > > > This email message is for the sole use of the intended recipient(s) and > may > > contain confidential and privileged information. Any unauthorized > review, > > use, disclosure, or distribution is prohibited. If you are not the > > intended recipient, please contact the sender by reply email and destroy > > all copies of the original message. Thank you. > > > > > > On Tue, Mar 15, 2022 at 7:55 AM David Li <lidav...@apache.org> wrote: > > > > > Aren't we getting a few things mixed up here? > > > > > > 1) As Micah says, the original proposal is about adapting Java types to > > > Arrow. This can be used independently of Flight SQL. I don't think this > > was > > > being pitched as a standard itself unless I'm mistaken? > > > > > > 2) Flight SQL the protocol, which _is_ a language agnostic standard, > > > though maybe not the one applications will generally choose to consume. > > > > > > 3) Idiomatic/standard per-language APIs that build on Flight SQL, which > > > will include JDBC/ODBC (there is a reference JDBC driver in the works > > [1]), > > > but I agree there's room for something that uses Arrow types, supports > > > partitioning, etc. as well. (And I agree there's room for something > that > > > supports these features but is _not_ Flight SQL underneath.) > > > > > > --- > > > > > > I'm not super experienced with JDBC/ODBC - would extending them > basically > > > mean something like (in JDBC) providing interfaces that Connections, > > > ResultSets, etc. could be cast to to access the "Arrow-native" bits? > And > > in > > > ODBC, using something like the SQL_C_BINARY type to 'tunnel' Arrow data > > > through ODBC buffers, and/or providing a set of C API functions that > > could > > > convert between (say) an ODBC statement handle and an Arrow C Data > > > Interface ArrowArrayStream? > > > > > > [1]: https://github.com/apache/arrow/pull/12254 > > > > > > -David > > > > > > On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote: > > > > Hi Julian, > > > > > > > > > > > >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to > > make > > > it > > > >> successful, build it on top of the leading API in each language > (which > > > in > > > >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to > > > expose > > > >> through your API the fact that FlightSQL is underneath. > > > > > > > > > > > > My understanding is that this thread is all about implementing a > Flight > > > > server and making those ergonomics easier. On the client side, I > think > > > the > > > > power of Flight/FlightSQL is two fold: > > > > 1. Reference ODBC/JDBC drivers that can consume the wire format > (and I > > > > think many clients will go this route). I think these are in the > > process > > > > of being contributed already. Which as you noted there is power in > > > > standards, so I expect this avenue to see heavy use. > > > > 2. For clients that can handle it and want to go through the > trouble, > > > > consuming the data directly as Arrow for efficiency purposes. I > don't > > > > think we've discussed canonical APIs by extending ODBC/JDBC but I > like > > > that > > > > idea. That seems like a discussion for after we have working > JDBC/ODBC > > > > reference implementation though? > > > > > > > > I might have missed it but I don't think either approach on the > client > > > side > > > > has been discussed on this thread. I also think this is why > Dataframe > > > > might not be the best name for the adapter because it comes with all > > > sorts > > > > of assumptions about usage both on a client and a server. > > > > > > > > Cheers, > > > > Micah > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> > > > wrote: > > > > > > > >> When I read “language-agnostic standard for data access” I cringed a > > > >> little. (See [1].) > > > >> > > > >> Sure, it’s fun to create a new standard. But if your standard is > > > >> successful, there will need to be a huge amount of work changing > > > existing > > > >> code to use your standard. That effort might even be difference > > between > > > >> success and failure for a small project, and therefore you have > helped > > > >> protect the incumbents. > > > >> > > > >> My solution? > > > >> > > > >> I would like the FlightSQL authors to make clear that it is a wire > > > >> protocol, and only a protocol. > > > >> > > > >> Rather than creating new APIs, I would like people to spend their > > effort > > > >> implementing existing APIs (such as ODBC and JDBC) on top of > > FlightSQL. > > > >> > > > >> If those APIs are inadequate (e.g. they don’t provide access to the > > raw > > > >> Arrow data, or don’t support INSERT or SELECT that are partitioned > > > across > > > >> several clients/servers), then add extensions to those APIs. But > still > > > >> implement the core APIs. When I describe a table from Java, I want > to > > a > > > >> result set that exactly matches JDBC’s getTables [2]. > > > >> > > > >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to > > make > > > it > > > >> successful, build it on top of the leading API in each language > (which > > > in > > > >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to > > > expose > > > >> through your API the fact that FlightSQL is underneath. > > > >> > > > >> Julian > > > >> > > > >> [1] https://xkcd.com/927/ <https://xkcd.com/927/> > > > >> > > > >> [2] > > > >> > > > > > > https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- > > > >> < > > > >> > > > > > > https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- > > > > > > > >> > > > >> > > > >> > > > >> > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> > > > wrote: > > > >> > > > > >> > While trying to implement and introduce the idea of adopting > > > FlightSQL, > > > >> the > > > >> > largest challenge was the API itself > > > >> > > > > >> > I know it's meant to be low-level. But I found that most of the > > > >> development > > > >> > time was in code to convert to/from > > > >> > row-based data (IE Map<String, Object>) and Java types, and > columnar > > > >> data + > > > >> > Arrow types. > > > >> > > > > >> > I'm likely in the minority position here -- I know that Arrow and > > > >> FlightSQL > > > >> > users are largely looking at transferring large volumes of data > and > > > >> > servicing OLAP-type workloads > > > >> > But the thing that excites me most about FlightSQL, isn't its > > > performance > > > >> > (always nice to have), but that it's a language-agnostic standard > > for > > > >> data > > > >> > access. > > > >> > > > > >> > That has broad implications -- for all kinds of data-access > > workloads > > > and > > > >> > business usecases. > > > >> > > > > >> > The challenge is that in trying to advocate for it, when > presenting > > a > > > >> > proof-of-concept, > > > >> > rather than what a developer might expect to see, something like: > > > >> > > > > >> > // FlightSQL handler code > > > >> > List<Map<String, Object>> results = ....; > > > >> > results.add(Map.of("id", 1, "name", "Person 1"); > > > >> > return results; > > > >> > > > > >> > A significant portion of the code is in Arrow-specific > > implementation > > > >> > details: > > > >> > creating a VectorSchemaRoot, FieldVector, de-serializing the > results > > > on > > > >> the > > > >> > client, etc. > > > >> > > > > >> > Just curious whether there is any interest/intention of possibly > > > making a > > > >> > higher level API around the basic FlightSQL one? > > > >> > Maybe something closer to the traditional notion of a row-based > > > >> "DataFrame" > > > >> > or "Table", like: > > > >> > > > > >> > DataFrame df = new DataFrame(); > > > >> > df.addColumn("id", ArrowTypes.Int); > > > >> > df.addColumn("name", ArrowTypes.VarChar); > > > >> > df.addRow(Map.of("id", 1, "name", "Person 1")); > > > >> > VectorSchemaRoot root = df.toVectorSchemaRoot(); > > > >> > listener.setVectorSchemaRoot(root); > > > >> > listener.sendVectorSchemaRootContents(); > > > >> > > > >> > > > > > > > > -- > > *James Duong* > Lead Software Developer > Bit Quill Technologies Inc. > Direct: +1.604.562.6082 | jam...@bitquilltech.com > https://www.bitquilltech.com > > This email message is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. Any unauthorized review, > use, disclosure, or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy > all copies of the original message. Thank you. >