[ 
https://issues.apache.org/jira/browse/HADOOP-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456426#comment-17456426
 ] 

Ayush Saxena edited comment on HADOOP-17098 at 12/9/21, 2:15 PM:
-----------------------------------------------------------------

Going through these changes. Looks ok to do, if in the end we eliminate guava 
and that does some wonders.

But. Is there any benchmarking done once we replace the stuff. Though trivial 
aren't they creating any negative performance impact. This touches the entire 
code base.

Well, sharing thoughts, I thought on similar lines as [~csun] thought. I think 
the major problem that Guava use to cause is due to conflicting versions, as it 
wasn't backward compatible. Post shading that problem got solved. Now this is 
some additional stuff, trying to solve the similar problem, now with different 
assertions.
{quote}reduce memory fooprint. (less classes to load)
{quote}
We would load our new classes now? How much impact the replacement will have. 
Wouldn't this be true for all 3rd Party Libraries 
{quote}better code management: Guava will likely has security updates that 
forces Hadoop to adopt new releases and dealing with compatibilities.
{quote}
Now we have implemented these and that too on similar lines, now if there is a 
problem. now we will be also responsible. Along with core hadoop stuff, we have 
to manage this as well.
{quote}avoid struggles analyzing guava performance
{quote}
On a lighter note: Does this mean the code we write doesn't need performance 
analysis?

In case someone has pointers about the organisation or other Apache projects 
who did such an exercise. 

This exercise isn't very specific to Hadoop, all other projects can use these 
utils irrespective whether they depend on hadoop or not. Might have found some 
place in Apache Commons or some related place, especially the new classes which 
are written.

 

Just an FYI.

The original Guava jar we can't eliminate, that is there for downstream 
projects relying on it.

--Not blocking anyone nor denying the idea, Please don't take this otherwise.


was (Author: ayushtkn):
Going through these changes. Looks ok to do, if in the end we eliminate guava 
and that does some wonders.

But. Is there any benchmarking done once we replace the stuff. Though trivial 
aren't they creating any negative performance impact. This touches the entire 
code base.

Well, sharing thoughts, I thought on similar lines as [~csun] thought. I think 
the major problem that Guava use to cause is due to conflicting versions, as it 
wasn't backward compatible. Post shading that problem got solved. Now this is 
some additional stuff, trying to solve the similar problem, now with different 
assertions.
{quote}reduce memory fooprint. (less classes to load)
{quote}
We would load our new classes now? How much impact the replacement will have. 
Wouldn't this true for all 3rd Party Libraries 
{quote}better code management: Guava will likely has security updates that 
forces Hadoop to adopt new releases and dealing with compatibilities.
{quote}
Now we have implemented these and that too in similar lines, now if there is a 
problem. now we will be also responsible. Along with core hadoop stuff, we have 
to manage this as well.
{quote}avoid struggles analyzing guava performance
{quote}
On a lighter note: Does this mean the code we right doesn't need performance 
analysis?

In case someone has pointers about the organisation or other Apache projects 
who did such an exercise. 

This exercise isn't very specific to Hadoop, all other projects can use these 
utils irrespective whether they depend on hadoop or not. Might have found some 
place in Apache Commons or some related place, especially the new classes which 
are written.

 

Just an FYI.

The original Guava jar we can't eliminate, that is there for downstream 
projects relying on it.

--Not blocking anyone nor denying the idea, Please don't take this otherwise.

> Reduce Guava dependency in Hadoop source code
> ---------------------------------------------
>
>                 Key: HADOOP-17098
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17098
>             Project: Hadoop Common
>          Issue Type: Task
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>
> Relying on Guava implementation in Hadoop has been painful due to 
> compatibility and vulnerability issues.
>  Guava updates tend to break/deprecate APIs. This made It hard to maintain 
> backward compatibility within hadoop versions and clients/downstreams.
> With 3.x uses java8+, the java 8 features should preferred to Guava, reducing 
> the footprint, and giving stability to source code.
> This jira should serve as an umbrella toward an incremental effort to reduce 
> the usage of Guava in the source code and to create subtasks to replace Guava 
> classes with Java features.
> Furthermore, it will be good to add a rule in the pre-commit build to warn 
> against introducing a new Guava usage in certain modules.
> Any one willing to take part in this code refactoring has to:
>  # Focus on one module at a time in order to reduce the conflicts and the 
> size of the patch. This will significantly help the reviewers.
>  # Run all the unit tests related to the module being affected by the change. 
> It is critical to verify that any change will not break the unit tests, or 
> cause a stable test case to become flaky.
>  # Merge should be done to the following branches:  trunk, branch-3.3, 
> branch-3.2, branch-3.1
>  
> A list of sub tasks replacing Guava APIs with java8 features:
> {code:java}
> com.google.common.io.BaseEncoding#base64()    java.util.Base64
> com.google.common.io.BaseEncoding#base64Url() java.util.Base64
> com.google.common.base.Joiner.on()                            
> java.lang.String#join() or 
>                                                                               
>            java.util.stream.Collectors#joining()
> com.google.common.base.Optional#of()                  java.util.Optional#of()
> com.google.common.base.Optional#absent()              
> java.util.Optional#empty()
> com.google.common.base.Optional#fromNullable()        
> java.util.Optional#ofNullable()
> com.google.common.base.Optional                               
> java.util.Optional
> com.google.common.base.Predicate                              
> java.util.function.Predicate
> com.google.common.base.Function                               
> java.util.function.Function
> com.google.common.base.Supplier                               
> java.util.function.Supplier
> {code}
>  
> I also vote for the replacement of {{Precondition}} with either a wrapper, or 
> Apache commons lang.
> I believe you guys have dealt with Guava compatibilities in the past and 
> probably have better insights. Any thoughts? [~weichiu], [~gabor.bota], 
> [[email protected]], [~ayushtkn], [~busbey], [~jeagles], [~kihwal]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to