I added some more stats to the wiki page, trying to determine what
dependencies are included in jars. It seems like there is opportunity.

Highlights, 50 copies of what appears to be some version of bcprov-jdk15
for a total of 162M. 51 copies of jackson-databind.

total size       copies  jar
     30.97MB     65     META-INF/bundled-dependencies/commons-lang3-XXX.jar
     32.53MB     50     META-INF/bundled-dependencies/bcpkix-jdk15on-XXX.jar
     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
     39.62MB      1     META-INF/bundled-dependencies/jython-shaded-XXX.jar
     63.06MB     51
 META-INF/bundled-dependencies/jackson-databind-XXX.jar
    162.07MB     50     META-INF/bundled-dependencies/bcprov-jdk15on-XXX.jar


On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <[email protected]> wrote:

> I tend to have feelings similar to Michael about a multi-repo approach.
> I’ve rarely seen it help and more often seen it hurt — it’s confusing
> (especially to newcomers), stuff gets neglected because it’s easier to
> ignore, you need another master project or some such to do an entire build.
>
> Maybe git submodules could help mitigate this, but creating independent
> assemblies or using different build profiles to enable building and
> packaging the binaries in different ways would satisfy everything except
> disentangling the releases.
>
> -joey
>
> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[email protected]>, wrote:
> > I agree... Long term extension registry, short term one repo with
> different
> > assemblies (e.g. standard, slim, analytic, etc...).
> >
> > Brandon
> >
> > On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
> [email protected]
> > wrote:
> >
> > > Option #3 also has my preference. But it's probably a good idea to only
> > > keep one git repo and play with the assembly and Maven profiles for the
> > > releases, no? It'd be certainly easier for release management process.
> But
> > > this decision could also depend on how the option #3 is going to be
> > > implemented I guess.
> > >
> > > 2018-01-13 6:36 GMT-07:00 Joe Witt <[email protected]>:
> > >
> > > > thanks tony!
> > > >
> > > > On Jan 12, 2018 10:48 PM, "Tony Kurc" <[email protected]> wrote:
> > > >
> > > > > I put some of the data I was working with on the wiki -
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/NIFI/NiFi+
> 1.5.0+nar+files
> > > > >
> > > > > On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[email protected]
> > > wrote:
> > > > >
> > > > > > So my favorite option is Bryan’s option number “three” of using
> the
> > > > > > extension registry. Now my thought is do we really need to add
> > > > complexity
> > > > > > and do anything in the mean time or just focus on that? Meaning
> we
> > > have
> > > > > > roughly 500mb of available capacity today so why don’t we spend
> those
> > > > man
> > > > > > hours we would spend on getting the second repo up on the
> extension
> > > > > > registry instead?
> > > > > >
> > > > > > @Bryan do you have thoughts about the deployment of those bars
> in the
> > > > > > extension registry? Since we won’t be able to build the release
> > > binary
> > > > > > anymore would we still need to create separate repos for the
> nars or
> > > > > no?? I
> > > > > > have used the registry a little but I’m not 100% sure on your
> vision
> > > > for
> > > > > > the nars
> > > > > >
> > > > > > - Jeremy Dyer
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[email protected]>
> wrote:
> > > > > > >
> > > > > > > I was looking at nar sizes, and thought some data may be
> helpful. I
> > > > > used
> > > > > > my recent RC1 verification as a basis for getting file sizes, and
> > > just
> > > > > got
> > > > > > the file size for each file in the assembly named "*.nar". I
> don't
> > > know
> > > > > > whether the images I pasted in will go through, but I made some
> > > > graphs.b
> > > > > > The first is a histogram of nar file size in buckets of 10MB. The
> > > > second
> > > > > > basically is similar to a cumulative distribution, the x axis is
> the
> > > > > "rank"
> > > > > > of the nar (smallest to largest), and the y-axis is how what
> fraction
> > > > of
> > > > > > the all the sizes of the nars together are that rank or lower. In
> > > other
> > > > > > words, on the graph, the dot at 60 and ~27 means that the
> smallest 60
> > > > > nars
> > > > > > contribute only ~27% of the total. Of note, the standard and
> > > framework
> > > > > nars
> > > > > > are at 83 and 84.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> > > [email protected]
> > > > > > wrote:
> > > > > > > > And of course, as I hit <send> I thought of one more thing.
> > > > > > > >
> > > > > > > > We could keep all of the code in 1 git repo (1 project) but
> the
> > > > > > > > nifi-assembly part of the build could be broken up to build
> core
> > > > NiFi
> > > > > > > > separately from the tar/zip functional grouping of other
> NARs.
> > > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> > > [email protected]
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Long term I would also like to see #3 be the solution. I
> think
> > > > what
> > > > > > > > > Joseph N described could be part of the capabilities of #3.
> > > > > > > > >
> > > > > > > > > I would like to add a note of caution with respect to
> > > reorganizing
> > > > > and
> > > > > > > > > releasing extension bundles separately:
> > > > > > > > >
> > > > > > > > > - the burden on release manager expands because many more
> > > > > projects
> > > > > > > > > have to be released; probably not all on each release cycle
> > > but
> > > > > it
> > > > > > could
> > > > > > > > > still be many
> > > > > > > > > - the chance of accidentally forgetting to release a
> project
> > > > in a
> > > > > > > > > release cycle becomes non-zero
> > > > > > > > > - sharing code between projects gets a bit harder because
> you
> > > > > have
> > > > > > to
> > > > > > > > > manage releasing projects in a specific order
> > > > > > > > > - it becomes harder to find all of the projects that need
> to
> > > > > change
> > > > > > > > > when shared code is added
> > > > > > > > > - the simple act of finding code becomes harder ... in
> which
> > > > > > project
> > > > > > > > > is that class in? (IDEs like IntelliJ can search in 1
> > > project,
> > > > > but
> > > > > > if they
> > > > > > > > > search across multiple projects, then I haven't learned
> how)
> > > > > > > > >
> > > > > > > > > I used to maintain several nars in separate projects, and
> > > recently
> > > > > > > > > reorganized them into 1 project (following NiFi's
> multi-module
> > > > maven
> > > > > > build)
> > > > > > > > > and life has become much easier!
> > > > > > > > >
> > > > > > > > > -- Mike
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > > > > [email protected]
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I very much like the solution proposed by Bryan below.
> This
> > > would
> > > > > > allow
> > > > > > > > > > for a cleaner docker image as well, while still proving
> the
> > > > > > functionality
> > > > > > > > > > as needed. For sure, the extension registry will be
> great, but
> > > in
> > > > > > the mean
> > > > > > > > > > time this is an adequate mid step.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Chris
> > > > > > > > > >
> > > > > > > > > > On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
> [email protected]
> > > > ,
> > > > > > wrote:
> > > > > > > > > > > Long term I'd like to see the extension registry take
> form
> > > and
> > > > > have
> > > > > > > > > > > that be the solution (#3).
> > > > > > > > > > >
> > > > > > > > > > > In the more near term, we could separate all of the
> NARs,
> > > > except
> > > > > > for
> > > > > > > > > > > framework and maybe standard processors & services,
> into a
> > > > > separate
> > > > > > > > > > > git repo.
> > > > > > > > > > >
> > > > > > > > > > > In that new git repo we could organize things like Joe
> N just
> > > > > > > > > > > described according to some kind of functional
> grouping. Each
> > > > of
> > > > > > these
> > > > > > > > > > > functional bundles could produce its own tar/zip which
> we can
> > > > > make
> > > > > > > > > > > available for download.
> > > > > > > > > > >
> > > > > > > > > > > That would separate the release cycles between core
> NiFi and
> > > > the
> > > > > > other
> > > > > > > > > > > NARs, and also avoid having any single binary artifact
> that
> > > > gets
> > > > > > too
> > > > > > > > > > > large.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > > > > [email protected]
> > > > > > > > > > wrote:
> > > > > > > > > > > > just a random thought.
> > > > > > > > > > > >
> > > > > > > > > > > > Drop In Lib packs... All the Hadoop ones in one
> package for
> > > > > > example
> > > > > > > > > > that
> > > > > > > > > > > > can be added to a slim Nifi install. Another may be
> for
> > > > Cloud,
> > > > > or
> > > > > > > > > > Database
> > > > > > > > > > > > Interactions, Integration (JMS, FTP, etc) of course
> > > defining
> > > > > > these
> > > > > > > > > > groups
> > > > > > > > > > > > would be the tricky part... Or perhaps some type of
> > > installer
> > > > > > which
> > > > > > > > > > allows
> > > > > > > > > > > > you to elect which packages to download to add to
> the slim
> > > > > > install?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > > > [email protected]
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Team,
> > > > > > > > > > > > >
> > > > > > > > > > > > > The NiFi convenience binary (tar.gz/zip) size has
> grown
> > > to
> > > > > > 1.1GB now
> > > > > > > > > > > > > in the latest release. Apache infra expanded it to
> 1.6GB
> > > > > > allowance
> > > > > > > > > > > > > for us but has stated this is the last time.
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > > > > > > > > > > >
> > > > > > > > > > > > > We need consider:
> > > > > > > > > > > > > 1) removing old nars/less commonly used nars/or
> > > > particularly
> > > > > > massive
> > > > > > > > > > > > > nars from the assembly we distribute by default.
> Folks
> > > can
> > > > > > still use
> > > > > > > > > > > > > these things if they want just not from our
> convenience
> > > > > binary
> > > > > > > > > > > > > 2) collapsing nars with highly repeating deps
> > > > > > > > > > > > > 3) Getting the extension registry baked into the
> Flow
> > > > > Registry
> > > > > > then
> > > > > > > > > > > > > moving to separate releases for extension bundles.
> The
> > > main
> > > > > > release
> > > > > > > > > > > > > then would be just the NiFi framework.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any other ideas ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'll plan to start identifying candiates for
> removal
> > > soon.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Joe
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Joseph
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Reply via email to