On second thought that makes sense, we didn't touch the V1 driver when porting to Spark3. Your suggestion sounds good.
On Thu, Aug 29, 2024 at 11:53 AM Rejeb Ben Rejeb <benrejebre...@gmail.com> wrote: > Yes, it's correct. > > Le jeu. 29 août 2024 à 11:12, Istvan Toth <st...@cloudera.com.invalid> a > écrit : > > > So the V1 code in Spark 3 still behaves as Spark2 does, which is > different > > from the V2 in Spark3. > > Do I understand correctly ? > > > > On Thu, Aug 29, 2024 at 10:28 AM Rejeb Ben Rejeb < > benrejebre...@gmail.com> > > wrote: > > > > > Sorry, I explained it badly, I meant spark3 will support both Append > and > > > Overwrite mode and both will behave the same way. > > > I agree that the new one is correct and that we shouldn't add support > for > > > Overwrite mode. > > > I think it is a better option than keeping old V1 code. > > > I don't think that it is possible to gate the new behavior somehow by > > > overriding spark internal code. > > > > > > Le jeu. 29 août 2024 à 09:20, Istvan Toth <st...@cloudera.com.invalid> > a > > > écrit : > > > > > > > On Wed, Aug 28, 2024 at 2:49 PM Rejeb Ben Rejeb < > > benrejebre...@gmail.com > > > > > > > > wrote: > > > > > > > > > Le mer. 28 août 2024 à 10:17, Istvan Toth > <st...@cloudera.com.invalid > > > > > > a > > > > > écrit : > > > > > > > > > > > On Mon, Aug 26, 2024 at 1:59 PM Rejeb Ben Rejeb < > > > > benrejebre...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > REMOVE DATASOURCE V1: > > > > > > > After removing V1 code, it is possible to configure V1 name as > a > > V2 > > > > > long > > > > > > > name. > > > > > > > I have to test it to be sure but this can be done by moving the > > > > > > > class PhoenixDataSource under package > "org.apache.phoenix.spark". > > > > > > > In this way, it will have no impact on old applications which > use > > > the > > > > > > spark > > > > > > > API. > > > > > > > > > > > > > Would creating a compatibility child of the driver under the old > > > > package > > > > > > name work ? > > > > > > I don't like the idea of moving the up-to-date code to a new > > package. > > > > > > > > > > I did some tests and with just moving PhoenixDataSource, I think > > > behavior > > > > > has changed since last time I worked on a connector. > > > > > Now it needs to rename the class to DefaultSource to make it work. > > > > > The best solution will be making DefaultSource inherit from > > > > > > > > > PhoenixDataSource works to avoid moving and renaming classes. > > > > > > > > > Sounds good. > > > > > > > > > For the spark3 connector, there is a small change to make it accept > > > > > Overwrite mode and it will behave the same as Append mode. > > > > > It's ok for me, since it is meant to maintain backward > compatibility. > > > > > > > > > We've changed that once, I wouldn't change the behaviour again > > > (especially > > > > as IMO the new one is correct). > > > > I think it would be best to gate that new behaviour behind an option. > > > > > > > > > > > > > > > > > > > > > > > > > > > When I wrote my first message I forget that there is also > helper > > > > > methods > > > > > > > like "phoenixTableAsDataFrame" or "saveToPhoenix", for those we > > > have > > > > > two > > > > > > > options: > > > > > > > > > > > > > > 1. Assume that these methods are no longer maintained, > > document > > > to > > > > > use > > > > > > > spark API instead and remove them. > > > > > > > 2. Keep methods and change method implementation to point to > > the > > > > V2 > > > > > > > datasource (all options of V1 are available with V2). > > > > > > > > > > > > > > Personally, I prefer option 1 as for old scala or java > > applications > > > > > they > > > > > > > need code and dependencies update to use a newest version of > > > > connector > > > > > > > anyway. Python or R applications will not be impacted as they > use > > > > Spark > > > > > > > API. > > > > > > > > > > > > > While I agree with you from a technical POV, the reality is that > > > there > > > > > are > > > > > > a lot of legacy spark jobs that I'd prefer not to break. > > > > > > Option 2 sounds better to me. > > > > > > > > > > > > > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS: > > > > > > > Yes, since the connector for spark 2 was compiled with scala > 2.11 > > > it > > > > > > can't > > > > > > > be run with spark 2 compiled with scala 2.12. Same applies for > > the > > > > > spark > > > > > > 3 > > > > > > > connector with scala 2.12 vs 2.13. > > > > > > > I meant to have this for later releases, IMHO, actually this > is a > > > > > > > limitation and it will be good to have the connector built with > > > both > > > > > > scala > > > > > > > versions so usage will not be restricted to only one version of > > the > > > > > spark > > > > > > > build. > > > > > > > I've done some quick research, it seems that there is a way to > > > manage > > > > > > this > > > > > > > with the scala-maven-plugin throw multiple executions instead > of > > > > using > > > > > > > maven profiles. > > > > > > > > > > > > > > That sounds fine, please open a ticket, and a PR with your > > > preferred > > > > > > solution. > > > > > > > > > > > OK I'll do it. > > > > > > > > > > > > > > > > > > > > > > > > Rejeb > > > > > > > > > > > > > > > > > > > > > Le lun. 26 août 2024 à 08:49, Istvan Toth <st...@apache.org> a > > > > écrit : > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Forgive my ignorance of Spark: > > > > > > > > > > > > > > > > REMOVE DATASOURCE V1: > > > > > > > > > > > > > > > > IIRC the V1 and V2 datasources have different names. > > > > > > > > Wouldn't this break applications using the old V1 name ? > > > > > > > > Is there a chance that this would break old applications ? > > > > > > > > > > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS: > > > > > > > > > > > > > > > > Is this required because scala 2.x runtimes are not backwards > > > > > > compatible > > > > > > > ? > > > > > > > > I don't see a problem with that. > > > > > > > > > > > > > > > > Its utility is limited until we start providing actual > releases > > > and > > > > > > > publish > > > > > > > > binary artifacts, but > > > > > > > > in theory I agree. > > > > > > > > > > > > > > > > The implementation would be a bit tricky, the solution that > > comes > > > > to > > > > > my > > > > > > > > mind is generating the artifacts > > > > > > > > in multiple maven runs with different profiles, like we do > for > > > the > > > > > > > > different HBase profiles now. > > > > > > > > > > > > > > > > Istvan > > > > > > > > > > > > > > > > On Fri, Aug 23, 2024 at 7:36 PM Rejeb Ben Rejeb < > > > > > > benrejebre...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I would like to start a discussion about two changes to the > > > > > > > > phoenix5-spark > > > > > > > > > and phoenix5-spark3. > > > > > > > > > > > > > > > > > > REMOVE DATASOURCE V1 > > > > > > > > > It is not longer necessarie to keep Datasource V1 classes, > > > since > > > > > all > > > > > > > > > features are implemented in new connector version classes. > T > > > > > > > > > When fixing the issue PHOENIX-6783, I checked for impacts > and > > > > done > > > > > > some > > > > > > > > > modifications to make removing the classes safe and without > > > > > impacts. > > > > > > > > > > > > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS > > > > > > > > > phoenix5-spark2 connector uses spark-2.4.8 wich is > available > > > with > > > > > > scala > > > > > > > > > 2.11 and scala 2.12. > > > > > > > > > Same for phoenix5-spark3 uses spark-3.2.4 wich is available > > > with > > > > > > scala > > > > > > > > 2.12 > > > > > > > > > and scala 2.13. > > > > > > > > > > > > > > > > > > It would be nice to have connector supporting both scala > > > version > > > > > like > > > > > > > > other > > > > > > > > > connectors for exemple mongoDB or cassandra. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Rejeb > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Cordialement, > > > > > > > Rejeb Ben Rejeb > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > *István Tóth* | Sr. Staff Software Engineer > > > > > > *Email*: st...@cloudera.com > > > > > > cloudera.com <https://www.cloudera.com> > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> > > [image: > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> > [image: > > > > > Cloudera > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > > > ------------------------------ > > > > > > ------------------------------ > > > > > > > > > > > > > > > > > > > > > -- > > > > > Cordialement, > > > > > Rejeb Ben Rejeb > > > > > > > > > > > > > > > > > -- > > > > *István Tóth* | Sr. Staff Software Engineer > > > > *Email*: st...@cloudera.com > > > > cloudera.com <https://www.cloudera.com> > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > > Cloudera > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > ------------------------------ > > > > ------------------------------ > > > > > > > > > > > > > -- > > > Cordialement, > > > Rejeb Ben Rejeb > > > > > > > > > -- > > *István Tóth* | Sr. Staff Software Engineer > > *Email*: st...@cloudera.com > > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > ------------------------------ > > ------------------------------ > > > > > -- > Cordialement, > Rejeb Ben Rejeb > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------