Thank you Andrew! I will take a look at if it is feasible to rewrite  
"jdbc:calcite:" in Beam's repackaged calcite.

Best,
Kai

On 2018/07/24 19:08:17, Andrew Pilloud <[email protected]> wrote: 
> I don't really think this is something that involves changes to
> DriverManager. Beam is causing the problem by relocating calcite's path but
> not also modifying the global state it creates.
> 
> Andrew
> 
> On Tue, Jul 24, 2018 at 12:03 PM Kai Jiang <[email protected]> wrote:
> 
> > Thanks Andrew! It's really helpful. I'll take a try on shade calcite with
> > rewriting the "jdbc:calcite".
> > I also have a look at the doc of DriverManager. Do you think include all
> > repackaged jdbc driver property setting like below will be helpful?
> >  jdbc.drivers=org.apache.beam.repackaged.beam.
> >
> > Best,
> > Kai
> >
> > On 2018/07/24 16:56:50, Andrew Pilloud <[email protected]> wrote:
> > > Looks like calcite isn't easily repackageable. This issue can be fixed
> > > either in our shading (by also rewriting the "jdbc:calcite:" string when
> > we
> > > shade calcite) or in calcite (by not using the driver manager to connect
> > > between calcite modules).
> > >
> > > Andrew
> > >
> > > On Mon, Jul 23, 2018 at 11:18 PM Kai Jiang <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I met an issue when I ran Beam SQL on Spark. I want to check and see if
> > > > anyone has same issue with me. I believe let beam sql running on spark
> > is
> > > > important. If you encountered same problem, it will be really helpful
> > if
> > > > you could give some inputs.
> > > >
> > > > Context:
> > > > I setup TPC framework to run sql on spark. Code
> > > > <
> > https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/src/main/java/org/apache/beam/sdk/extensions/tpc/BeamTpc.java
> > >
> > > > is simple which just ingests csv data and apply Sql on that. Gradle
> > > > <
> > https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/build.gradle>
> > setting
> > > > includes `runner-spark` and necessary libraries.  Exception Stack trace
> > > > <https://gist.github.com/vectorijk/849cbcd5bce558e5e7c97916ca4c793a>
> > shows
> > > > some details. However, same code can running on Flink and Dataflow
> > > > successfully.
> > > >
> > > > Investigations:
> > > > BEAM-3386 <https://issues.apache.org/jira/browse/BEAM-3386> also
> > > > describes the similar issue I have. It took me some time on
> > investigating
> > > > it. I guess there should be a version conflict between Calcite library
> > in
> > > > Spark and Beam SQL repackaged Calcite. The version of Calcite library
> > Spark
> > > > ( * - 2.3.1) used is very old (1.2.0-incubating).
> > > >
> > > > After packaging fat jar and submitting it to Spark, Spark registered
> > both
> > > > old version's calcite jdbc driver and Beam's repackaged jdbc driver in
> > > >
> > > > registeredDrivers(DriverManager.java#L294 <
> > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L294>).
> > Jdbc's DriverManager always connects to old version calcite's jdbc in spark
> > instead of beam's repackaged calcite.
> > > >
> > > >
> > > > Looking into Line DriverManager.java#L556 <
> > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556>
> > and insert a breakpoint, aClass =
> > Class.forName(driver.getClass().getName(), true, classLoader);
> > > >
> > > > driver.getClass().getName() -> "org.apache.calcite.jdbc.Driver"
> > > > classLoader only has class 'org.apache.beam.**' and
> > > > 'org.apache.beam.repackaged.beam_***'. (There is no path of class
> > > > 'org.apache.calcite.*')
> > > >
> > > > Oddly, aClass is assigned with Class "org.apache.calcite.jdbc.Driver".
> > I
> > > > think it should raise an exception and be skipped. Actually, It did
> > not.  So
> > > > this spark's calcite jdbc driver has been connected. All logic
> > afterwards
> > > > goes to spark's calcite classpath. I believe that's pivot point.
> > > >
> > > > Potentially solutions:
> > > > *1.* Figure out why DriverManager.java#L556
> > > > <
> > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556>
> > does
> > > > not throw exception.
> > > >
> > > > I guess it is the best option.
> > > >
> > > > 2. Upgrade Spark' calcite.
> > > >
> > > > It is not a good option because old calcite version affects many spark
> > > > versions.
> > > >
> > > > 3. Not using repackage for calcite library.
> > > >
> > > > I tried. I built fat jar with non-repackaged calcite. But, Spark is
> > still
> > > > using its own calcite.
> > > >
> > > > Plus, I am curious if there is any specific reason we need to use
> > > > repackage strategy for Calcite. @Mingmin Xu <[email protected]>
> > > >
> > > >
> > > > Thanks for reading!
> > > >
> > > > Best,
> > > > Kai
> > > > ᐧ
> > > >
> > >
> >
> 

Reply via email to