I support the change. It could help the project. Fortunately, the diagram
is simply made with draw.io and can be recreated in no time.
[image: flume_example.png]

Let me know where I can help.


On Sun, Apr 28, 2019 at 9:08 PM Mike Percy <mpe...@apache.org> wrote:

> Great, sounds like you made progress on the perf thing. I’m not talking
> about other products Flume is bundled with, simply what the project ships
> with the binary artifacts at release time.
>
> Mike
>
> Sent from my iPhone
>
> > On Apr 28, 2019, at 12:04 PM, Ralph Goers <ralph.go...@dslextreme.com>
> wrote:
> >
> > Yes, Mike. I understand that it is shipped with a product that uses it
> for that purpose. To be honest, I have used Flume in 3 different projects
> so far and none of them have integrated with Hadoop. I do have an upcoming
> project that probably will, although Hadoop will probably only be one of
> the destinations the data is delivered to. The others might be a third
> party SIEM product as well as some kind of ELK stack, so even in that case
> Hadoop wouldn’t be the primary “selling” point.
> >
> > No, I haven’t done profiling yet. At this point my main focus is Log4j.
> Once I get past that I can take a pass at profiling. It is possible the
> problem might be in Log4j, but since the embedded Appender just constructs
> the event and passes it to the Flume Embedded Agent I would be surprised if
> it is in Log4j. However, while testing I did find one bug already in Log4j
> that was causing a performance hit with Flume and have corrected that.
> >
> > Ralph
> >
> >> On Apr 28, 2019, at 11:42 AM, Mike Percy <mpe...@apache.org> wrote:
> >>
> >> I’d certainly be in favor of updating the project description to be
> more general. That said, part of Flume’s value proposition is integration
> with a bunch of components off the shelf and the main ones it ships are
> Hadoop ecosystem components, so we shouldn’t completely ignore that when
> describing the project.
> >>
> >> Regarding the memory channel perf issues you observed, did you do any
> profiling? Do you think part of the issue could be Java GC? The memory
> channel tends to allocate and reclaim a lot of memory in a short period of
> time.
> >>
> >> Mike
> >>
> >> Sent from my iPhone
> >>
> >>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ralph.go...@dslextreme.com>
> wrote:
> >>>
> >>> What I am seeing is that people go to the home page and cut the first
> paragraph as a description of Flume. All I am really proposing is that we
> change that to more effectively describe Flume. The description that is
> there is accurate but minimal. I would just like to rephrase that paragraph
> to give a more complete description of what Flume can be used for.
> >>>
> >>> As an aside, I have been working on Log4j, Spring-Cloud-Config and
> docker. In doing that I have done some crude benchmarking which you can see
> at
> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance
> <
> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>.
> I was quite surprised the performance of the Flume Embedded Appender with a
> memory channel. I would have expected it to be more in line with the Async
> Loggers and at the most in line with the Rolling File Appender since the
> event is essentially handed to another thread to be processed.  It would be
> nice to see Flume be able to recommended for use as a log
> forwarder/aggregator for all apps with Docker instead of just when
> guaranteed delivery is required and I would love to upgrade the Flume
> documentation to describe how to do that.
> >>>
> >>> Ralph
> >>>
> >>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <
> bes...@apache.org> wrote:
> >>>>
> >>>> I agree that marketing could be improved and I support finding a
> >>>> slogan that represents best what Flume is today.
> >>>> I am not sure about the wording that has been proposed, though. Can
> >>>> you please elaborate, Ralph?
> >>>>
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Donat
> >>>>
> >>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <
> ralph.go...@dslextreme.com> wrote:
> >>>>>
> >>>>> When I read sites like
> https://www.slant.co/versus/959/960/~fluentd_vs_flume <
> https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit
> discouraged at how people misunderstand Flume. Even a site like
> https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <
> https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is
> misleading by copying our home page by just saying "Flume is a distributed,
> reliable, and available service for efficiently collecting, aggregating,
> and moving large amounts of log data” and then copying the image. This
> leads users to believe that Flume is only useful in a small set of use
> cases and is intimately tied to Hadoop.
> >>>>>
> >>>>> I believe the home page should be changed to indicate say that
> "Flume is a distributed, reliable, and available service for efficiently
> collecting, aggregating, and streaming large amounts of data”, and then
> following up to indicate that it is appropriate to use to move any kind of
> streaming data such as application, audit, or system logs, real time events
> such as stock quotes, or user transaction records.
> >>>>>
> >>>>> The second sentence should also be modified to say "It is robust and
> fault tolerant with tunable reliability mechanisms that can insure
> guaranteed delivery and many failover and recovery mechanisms”.
> >>>>>
> >>>>> I also think the very first image should be modified to not show
> just a web application and HDFS as it seems to give people the impression
> that Flume is only usable with Hadoop or in web applications.
> Unfortunately, only the png seems to have been committed so redoing the
> diagram will mean starting from scratch.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Ralph
> >>>>
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to