anitakar commented on issue #15933: URL: https://github.com/apache/airflow/issues/15933#issuecomment-846450124
@potiuk Thank you for pointing out that leveldb package is outside of cloud package ( https://github.com/apache/airflow/tree/master/airflow/providers/google/leveldb) and also that it is an extra ( https://github.com/apache/airflow/blob/master/airflow/providers/google/provider.yaml#L751 ). I am a little confused though. I see plyvel on the list of requirements on https://pypi.org/project/apache-airflow-providers-google/ and I do not see any extra name there. Ok, the change has been merged 4 days ago ( https://github.com/apache/airflow/pull/15812). Yep, it seems to solve the problem of a problematic dependency on Mac. So, nothing more to be done there, at least in my opinion. Going back to splitting discussion. I can see that having to download and install lots of libraries when you just want to use for example firebase, can lead to dependency conflicts with your code, even though you do not use those other dependencies when you are using Google providers. Also, why download all those binaries and for example make your docker image bigger. But managing dependencies is a complicated and in my opinion quite unsolvable problem. Actually, the way it is now and as Jarek pointed out, when you install Google providers, the package has taken care of solving those conflicts at least for the Google providers package code. If I could I would like to make a poll here if people are unhappy about having to download and solve dependency conflicts against all libraries on Google provider package? Or maybe, they are happy that somebody has fixed those dependencies so that they match each other? When it comes to backwards compatibility. It is quite easy to do a split on the top level of google providers package ( https://github.com/apache/airflow/tree/master/airflow/providers/google). But it seems impossible inside cloud package ( https://github.com/apache/airflow/tree/master/airflow/providers/google/cloud) as packages there are split technically (hooks, sensors, operators ...) not on a domain/service level (automl, bigquery etc.). Operators could keep their imports ( https://github.com/apache/airflow/tree/master/airflow/providers/google/cloud/operators) but matching hooks and sensors couldn't, so a change would be breaking by definition. Personally, in software development I am all for splitting packages by domain not technical function. But I am also against forcing everyone to modify their code if there is no strong reason. I would also like to say those are my personal opinions not my employer's. sob., 22 maj 2021 o 18:41 Jarek Potiuk ***@***.***> napisał(a): > Yeah. I see the case @eladkal <https://github.com/eladkal>. But I kind of > see how Plyvel is very different from all other Google "core" components. > Which cloud/ads/marketing_platform might be really "core" google business, > where Plyvel is more of a "side" thing and could be fully separated out. I > think it is much easier to separate out plyvel as a provider than splitting > google "core" one. > > I am quite torn on this one. I see the benefits of keeping all those as > single "provider" - on the other hand the benefits of splitting it are > "potentially good" but it makes them also difficult to work together when > installed with all the dependencies - it will be easy for example to run in > a situation where "cloud" dependencies are different (and conflicting) with > "ads" dependencies. I think until we figure out (possibly) some way of > separating the dependencies out between tasks, this might be a difficult > one to tackle and rather than "origin" (google), the "independence" of a > package might be more important. > > I think the "extras" approach where plyvel is "optional" google extra > dependency is maybe not perfect, but it kind of join both worlds. On one > hand we have them together by "ownership", on the other hand we have just a > problem of "environment" set properly for plyvel to work (so plyvel db libs > installed). I think the current approach is kinda sustainable and does not > introduce too much of complexity. > > As usually with "compromise" solutions, it's not perfect but maybe it's > "good enough"? As long as it works, it could be "OK". I am also quite OK to > separate just plyvel as @anitakar <https://github.com/anitakar> suggested. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/airflow/issues/15933#issuecomment-846433474>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABE6AZBUE4CM5Z3JL4JJZUTTO7NEZANCNFSM45D4T6YA> > . > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
