That sounds great. I'm subscribed to that list as well, so I'll keep an eye out for your email.
Andrew On Sun, Jul 29, 2018 at 9:07 PM Kai Jiang <[email protected]> wrote: > Hi Andrew, > > I tried on replacing "jdbc:calcite" to "jdbc:beam" in calcite and > re-shadow. After that, Beam Sql can run on Spark now. > However, I didn't find an approach to modify code during shading Calcite > library. I think second method you mentioned is feasible. > I'll forward this thread to dev@calcite and to see if we can connect > between calcite modules without using the DriverManager. > > Best, > Kai > ᐧ > > On Tue, Jul 24, 2018 at 1:04 PM Kai Jiang <[email protected]> wrote: > >> Thank you Andrew! I will take a look at if it is feasible to rewrite >> "jdbc:calcite:" in Beam's repackaged calcite. >> >> Best, >> Kai >> >> On 2018/07/24 19:08:17, Andrew Pilloud <[email protected]> wrote: >> > I don't really think this is something that involves changes to >> > DriverManager. Beam is causing the problem by relocating calcite's path >> but >> > not also modifying the global state it creates. >> > >> > Andrew >> > >> > On Tue, Jul 24, 2018 at 12:03 PM Kai Jiang <[email protected]> wrote: >> > >> > > Thanks Andrew! It's really helpful. I'll take a try on shade calcite >> with >> > > rewriting the "jdbc:calcite". >> > > I also have a look at the doc of DriverManager. Do you think include >> all >> > > repackaged jdbc driver property setting like below will be helpful? >> > > jdbc.drivers=org.apache.beam.repackaged.beam. >> > > >> > > Best, >> > > Kai >> > > >> > > On 2018/07/24 16:56:50, Andrew Pilloud <[email protected]> wrote: >> > > > Looks like calcite isn't easily repackageable. This issue can be >> fixed >> > > > either in our shading (by also rewriting the "jdbc:calcite:" string >> when >> > > we >> > > > shade calcite) or in calcite (by not using the driver manager to >> connect >> > > > between calcite modules). >> > > > >> > > > Andrew >> > > > >> > > > On Mon, Jul 23, 2018 at 11:18 PM Kai Jiang <[email protected]> >> wrote: >> > > > >> > > > > Hi all, >> > > > > >> > > > > I met an issue when I ran Beam SQL on Spark. I want to check and >> see if >> > > > > anyone has same issue with me. I believe let beam sql running on >> spark >> > > is >> > > > > important. If you encountered same problem, it will be really >> helpful >> > > if >> > > > > you could give some inputs. >> > > > > >> > > > > Context: >> > > > > I setup TPC framework to run sql on spark. Code >> > > > > < >> > > >> https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/src/main/java/org/apache/beam/sdk/extensions/tpc/BeamTpc.java >> > > > >> > > > > is simple which just ingests csv data and apply Sql on that. >> Gradle >> > > > > < >> > > >> https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/build.gradle >> > >> > > setting >> > > > > includes `runner-spark` and necessary libraries. Exception Stack >> trace >> > > > > < >> https://gist.github.com/vectorijk/849cbcd5bce558e5e7c97916ca4c793a> >> > > shows >> > > > > some details. However, same code can running on Flink and Dataflow >> > > > > successfully. >> > > > > >> > > > > Investigations: >> > > > > BEAM-3386 <https://issues.apache.org/jira/browse/BEAM-3386> also >> > > > > describes the similar issue I have. It took me some time on >> > > investigating >> > > > > it. I guess there should be a version conflict between Calcite >> library >> > > in >> > > > > Spark and Beam SQL repackaged Calcite. The version of Calcite >> library >> > > Spark >> > > > > ( * - 2.3.1) used is very old (1.2.0-incubating). >> > > > > >> > > > > After packaging fat jar and submitting it to Spark, Spark >> registered >> > > both >> > > > > old version's calcite jdbc driver and Beam's repackaged jdbc >> driver in >> > > > > >> > > > > registeredDrivers(DriverManager.java#L294 < >> > > >> https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L294 >> >). >> > > Jdbc's DriverManager always connects to old version calcite's jdbc in >> spark >> > > instead of beam's repackaged calcite. >> > > > > >> > > > > >> > > > > Looking into Line DriverManager.java#L556 < >> > > >> https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556 >> > >> > > and insert a breakpoint, aClass = >> > > Class.forName(driver.getClass().getName(), true, classLoader); >> > > > > >> > > > > driver.getClass().getName() -> "org.apache.calcite.jdbc.Driver" >> > > > > classLoader only has class 'org.apache.beam.**' and >> > > > > 'org.apache.beam.repackaged.beam_***'. (There is no path of class >> > > > > 'org.apache.calcite.*') >> > > > > >> > > > > Oddly, aClass is assigned with Class >> "org.apache.calcite.jdbc.Driver". >> > > I >> > > > > think it should raise an exception and be skipped. Actually, It >> did >> > > not. So >> > > > > this spark's calcite jdbc driver has been connected. All logic >> > > afterwards >> > > > > goes to spark's calcite classpath. I believe that's pivot point. >> > > > > >> > > > > Potentially solutions: >> > > > > *1.* Figure out why DriverManager.java#L556 >> > > > > < >> > > >> https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556 >> > >> > > does >> > > > > not throw exception. >> > > > > >> > > > > I guess it is the best option. >> > > > > >> > > > > 2. Upgrade Spark' calcite. >> > > > > >> > > > > It is not a good option because old calcite version affects many >> spark >> > > > > versions. >> > > > > >> > > > > 3. Not using repackage for calcite library. >> > > > > >> > > > > I tried. I built fat jar with non-repackaged calcite. But, Spark >> is >> > > still >> > > > > using its own calcite. >> > > > > >> > > > > Plus, I am curious if there is any specific reason we need to use >> > > > > repackage strategy for Calcite. @Mingmin Xu <[email protected]> >> > > > > >> > > > > >> > > > > Thanks for reading! >> > > > > >> > > > > Best, >> > > > > Kai >> > > > > ᐧ >> > > > > >> > > > >> > > >> > >> >
