Github user trixpan commented on the issue:
https://github.com/apache/nifi/pull/475
@joewitt , as a user.proto-developer I would be happy with any approach
that results in workable binaries covering a particular flavour / supported
platform without having to commit a patch every time I pull from git.
Agree that perhaps settings.xml will be enough but I generally think the
least we fiddle with wider settings (settings.xml for example) the better.
I also thought that a profile is better than creating documentation
articles because I personally believe that code is generally more likely to be
looked after than documentation articles.
May be me just me as a total JAVA and Maven newbie but nothing beats the
simplicity of `-Phadoop_flavour=cdh5`
I fully agree we must ensure licensing is properly taken care of; Having
said that, I have always been under the impression that unlike the GPL, the ASL
does impose restrictions around *linking* non-ASF code. So hypothetically
speaking I suspect we could even go to the extreme lengths of releasing
binaries linking to non ASL code as long the foreign code licenses are
respected (e.g. ASL software does not exclude GPL licensed code, it is the [GPL
- through its terms - that excludes liking by ASF licensed
code](http://www.apache.org/licenses/GPL-compatibility.html)).
Having said that, I suspect, given the [presence of MapR hadoop related
code on
github](https://github.com/mapr/hadoop-common/blob/release-2.7.0-mapr-1506/hadoop-hdfs-project/hadoop-hdfs/pom.xml),
that their hadoop artifacts are released under ASF 2.0 but perhaps one of
theirs, like @tdunning can help shedding some light.
But back to the profile:
The reason I ended up trying the profile approach is the fact Spark refers
directly to Cloudera's and MapR's repositories on its main
[pom.xml](https://github.com/apache/spark/blob/branch-1.6/pom.xml#L285). This
led me to conclude (perhaps incorrectly) that it would be ok to have a pom
pointing to a particular set of artifacts as long the binary produced by the
formal release does not break ASF or foreign license licensing restrictions.
To be hones, Spark's approach is even simpler than using profiles:
Their pom.xml includes all repo's enabled by default and [let the user
specify the hadoop version
as](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version):
```
# Cloudera CDH 4.2.0 with MapReduce v1
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
```
Smartly paying with the fact that while vendors must respect the artifact
ID, they tend to distinguish their supported code by embedding their names into
the software version(e.g. 2.0.0-mr1-cdh4.2.0, hadoop-hdfs-2.x-mapr-1506, etc).
Yet, given to support Spark's approach changes would have to be introduced
to the main pom.xml, I decided to keep the scope of changes minimal, changing
only `hadoop-libraries-nar` pom, hence reducing the potential of code spilling
beyond planned/needed.
Hope this makes my way of thinking a bit clearer.
Please let me know your preference and I will be happy to adjust.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---