[jira] [Commented] (HADOOP-17098) Reduce Guava dependency in Hadoop source code

Ahmed Hussein (Jira) Thu, 09 Dec 2021 08:06:08 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456529#comment-17456529
 ]


Ahmed Hussein commented on HADOOP-17098:
----------------------------------------

Thanks [~ayushtkn],
I completely understand your point.
There is one important factor that is not mentioned here, which is that lots of 
the features became part of the JDK. This is an unnecessary redundancy.
Also, based on what I have seen in the code, usage of Guava library became the 
norm (maybe due to coding style and copy-paste).
For example, I saw hundreds of places that were initializing lists and sets 
through Guava for no apparent reason. 

bq. We would load our new classes now? How much impact the replacement will 
have. Wouldn't this be true for all 3rd Party Libraries 

It is true that we had to add wrappers to match the API. However, those new 
wrappers are using JDK features instead of Guava classes. 
Regarding 3rd party libraries, this would not be tru if the library is adding a 
value to the code. If a library provides an API that is already in the JDK, 
then we should revisit and question the usage of that library.
The problem with Guava usages that are really unnecessary in many places and 
they provided almost nothing after JDK8+.

bq. Now we have implemented these and that too on similar lines, now if there 
is a problem. now we will be also responsible. Along with core hadoop stuff, we 
have to manage this as well.

I do not see that Guava really did any better regarding security. Upgrading 
Guava dependency is always a pain and HAdoop gets stuck with a vulnerable Guava 
release for quite sometime.
We implemented very basic wrappers that call JDK classes (Preconditions, 
Supplier, Predicate..etc). If there is a security issue, then it is most 
probably a JDK related issue.
 
bq. On a lighter note: Does this mean the code we write doesn't need 
performance analysis?

OF course we need to evaluate the code to identify the hot paths. This will 
enable us to improve the execution time and the space usage as needed.
For example, optimizing a loop, pool-allocation, replace lambda, ..etc.
With Guava, it is a different story because it provides entire package. For 
example, we will have: Guava collections which are different than Java 
collections and can give you completely different performance. In addition, we 
will still have the same evaluation of the code structure.
There won't be fine grained control over Guava. 


> Reduce Guava dependency in Hadoop source code
> ---------------------------------------------
>
>                 Key: HADOOP-17098
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17098
>             Project: Hadoop Common
>          Issue Type: Task
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>
> Relying on Guava implementation in Hadoop has been painful due to 
> compatibility and vulnerability issues.
>  Guava updates tend to break/deprecate APIs. This made It hard to maintain 
> backward compatibility within hadoop versions and clients/downstreams.
> With 3.x uses java8+, the java 8 features should preferred to Guava, reducing 
> the footprint, and giving stability to source code.
> This jira should serve as an umbrella toward an incremental effort to reduce 
> the usage of Guava in the source code and to create subtasks to replace Guava 
> classes with Java features.
> Furthermore, it will be good to add a rule in the pre-commit build to warn 
> against introducing a new Guava usage in certain modules.
> Any one willing to take part in this code refactoring has to:
>  # Focus on one module at a time in order to reduce the conflicts and the 
> size of the patch. This will significantly help the reviewers.
>  # Run all the unit tests related to the module being affected by the change. 
> It is critical to verify that any change will not break the unit tests, or 
> cause a stable test case to become flaky.
>  # Merge should be done to the following branches:  trunk, branch-3.3, 
> branch-3.2, branch-3.1
>  
> A list of sub tasks replacing Guava APIs with java8 features:
> {code:java}
> com.google.common.io.BaseEncoding#base64()    java.util.Base64
> com.google.common.io.BaseEncoding#base64Url() java.util.Base64
> com.google.common.base.Joiner.on()                            
> java.lang.String#join() or 
>                                                                               
>            java.util.stream.Collectors#joining()
> com.google.common.base.Optional#of()                  java.util.Optional#of()
> com.google.common.base.Optional#absent()              
> java.util.Optional#empty()
> com.google.common.base.Optional#fromNullable()        
> java.util.Optional#ofNullable()
> com.google.common.base.Optional                               
> java.util.Optional
> com.google.common.base.Predicate                              
> java.util.function.Predicate
> com.google.common.base.Function                               
> java.util.function.Function
> com.google.common.base.Supplier                               
> java.util.function.Supplier
> {code}
>  
> I also vote for the replacement of {{Precondition}} with either a wrapper, or 
> Apache commons lang.
> I believe you guys have dealt with Guava compatibilities in the past and 
> probably have better insights. Any thoughts? [~weichiu], [~gabor.bota], 
> [[email protected]], [~ayushtkn], [~busbey], [~jeagles], [~kihwal]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-17098) Reduce Guava dependency in Hadoop source code

Reply via email to