I support the change. It could help the project. Fortunately, the diagram is simply made with draw.io and can be recreated in no time. [image: flume_example.png]
Let me know where I can help. On Sun, Apr 28, 2019 at 9:08 PM Mike Percy <mpe...@apache.org> wrote: > Great, sounds like you made progress on the perf thing. I’m not talking > about other products Flume is bundled with, simply what the project ships > with the binary artifacts at release time. > > Mike > > Sent from my iPhone > > > On Apr 28, 2019, at 12:04 PM, Ralph Goers <ralph.go...@dslextreme.com> > wrote: > > > > Yes, Mike. I understand that it is shipped with a product that uses it > for that purpose. To be honest, I have used Flume in 3 different projects > so far and none of them have integrated with Hadoop. I do have an upcoming > project that probably will, although Hadoop will probably only be one of > the destinations the data is delivered to. The others might be a third > party SIEM product as well as some kind of ELK stack, so even in that case > Hadoop wouldn’t be the primary “selling” point. > > > > No, I haven’t done profiling yet. At this point my main focus is Log4j. > Once I get past that I can take a pass at profiling. It is possible the > problem might be in Log4j, but since the embedded Appender just constructs > the event and passes it to the Flume Embedded Agent I would be surprised if > it is in Log4j. However, while testing I did find one bug already in Log4j > that was causing a performance hit with Flume and have corrected that. > > > > Ralph > > > >> On Apr 28, 2019, at 11:42 AM, Mike Percy <mpe...@apache.org> wrote: > >> > >> I’d certainly be in favor of updating the project description to be > more general. That said, part of Flume’s value proposition is integration > with a bunch of components off the shelf and the main ones it ships are > Hadoop ecosystem components, so we shouldn’t completely ignore that when > describing the project. > >> > >> Regarding the memory channel perf issues you observed, did you do any > profiling? Do you think part of the issue could be Java GC? The memory > channel tends to allocate and reclaim a lot of memory in a short period of > time. > >> > >> Mike > >> > >> Sent from my iPhone > >> > >>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ralph.go...@dslextreme.com> > wrote: > >>> > >>> What I am seeing is that people go to the home page and cut the first > paragraph as a description of Flume. All I am really proposing is that we > change that to more effectively describe Flume. The description that is > there is accurate but minimal. I would just like to rephrase that paragraph > to give a more complete description of what Flume can be used for. > >>> > >>> As an aside, I have been working on Log4j, Spring-Cloud-Config and > docker. In doing that I have done some crude benchmarking which you can see > at > http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance > < > http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. > I was quite surprised the performance of the Flume Embedded Appender with a > memory channel. I would have expected it to be more in line with the Async > Loggers and at the most in line with the Rolling File Appender since the > event is essentially handed to another thread to be processed. It would be > nice to see Flume be able to recommended for use as a log > forwarder/aggregator for all apps with Docker instead of just when > guaranteed delivery is required and I would love to upgrade the Flume > documentation to describe how to do that. > >>> > >>> Ralph > >>> > >>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát < > bes...@apache.org> wrote: > >>>> > >>>> I agree that marketing could be improved and I support finding a > >>>> slogan that represents best what Flume is today. > >>>> I am not sure about the wording that has been proposed, though. Can > >>>> you please elaborate, Ralph? > >>>> > >>>> > >>>> Thank you, > >>>> > >>>> Donat > >>>> > >>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers < > ralph.go...@dslextreme.com> wrote: > >>>>> > >>>>> When I read sites like > https://www.slant.co/versus/959/960/~fluentd_vs_flume < > https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit > discouraged at how people misunderstand Flume. Even a site like > https://www.predictiveanalyticstoday.com/data-ingestion-tools/ < > https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is > misleading by copying our home page by just saying "Flume is a distributed, > reliable, and available service for efficiently collecting, aggregating, > and moving large amounts of log data” and then copying the image. This > leads users to believe that Flume is only useful in a small set of use > cases and is intimately tied to Hadoop. > >>>>> > >>>>> I believe the home page should be changed to indicate say that > "Flume is a distributed, reliable, and available service for efficiently > collecting, aggregating, and streaming large amounts of data”, and then > following up to indicate that it is appropriate to use to move any kind of > streaming data such as application, audit, or system logs, real time events > such as stock quotes, or user transaction records. > >>>>> > >>>>> The second sentence should also be modified to say "It is robust and > fault tolerant with tunable reliability mechanisms that can insure > guaranteed delivery and many failover and recovery mechanisms”. > >>>>> > >>>>> I also think the very first image should be modified to not show > just a web application and HDFS as it seems to give people the impression > that Flume is only usable with Hadoop or in web applications. > Unfortunately, only the png seems to have been committed so redoing the > diagram will mean starting from scratch. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> Ralph > >>>> > >>> > >> > >> > > > > > >