So: I write a custom FlumeManager (and add a new FlumeAppender.ManagerType enum called CUSTOM where I can plug in my own class name through a new attribute to be named later). I extract some of the logic out of the FlumeAvroManager for its batch processing logic.
My FlumeManager does not send Flume events to a Flume Agent, but instead only caches the log events and waits for the driver to drain them. The FlumeManager cannot do the database IO since the client app does not know about Log4j. The manager has no idea it is collecting data to be included as a payload in a larger buffer mixed with data gathered from JDBC API calls. Even if it did, it would not know what to do with the result buffers coming back from a DBMS server. The driver needs the log4j-flume-ng module and whatever minimal set of flume jars just to be able to see the Flume Event type for example. Then, the FlumeAppender has: @Override public void append(final LogEvent event) { final String name = event.getLoggerName(); if (name != null) { for (final String pkg : EXCLUDED_PACKAGES) { if (name.startsWith(pkg)) { return; } } } final FlumeEvent flumeEvent = factory.createEvent(event, mdcIncludes, mdcExcludes, mdcRequired, mdcPrefix, eventPrefix, compressBody); flumeEvent.setBody(getLayout().toByteArray(flumeEvent)); manager.send(flumeEvent); } Since I need to use the FlumeAppender to use my custom FlumeManager, I also have to create a custom Layout that serializes events as I needed. But then I am getting an extra FlumeEvent object where I only need the original LogEvent or the byte[]. So that's a small penalty. I thought that I could implements a custom org.apache.logging.log4j.flume.appender.FlumeEventFactory to just pass through the LogEvent but that is not possible since FlumeEvent is a class and not an interface. So my factory can just pass null to the FlumeEvent except for the log event. I am not sure I am connecting all the dots here. To recap, I need: - A small new feature in FlumeAppender to plugin a custom FlumeManager class (not sure about the ctor signature requirements but that's where builders come in) - A custom FlumeManager with code extracted from the AvroFlumeManager to do event batch caching. - A custom Layout to serialize a LogEvent to whatever I need So far I do not see how that is easier than writing a Collection Appender. Thank you for reading this far (again), Gary On Mon, Sep 26, 2016 at 2:51 PM, Ralph Goers <ralph.go...@dslextreme.com> wrote: > Well, the key to this is “proprietary socket protocol”. Today, the Flume > appender does everything you want except that it is hardwired to use the > Avro RpcClient to send a batch of Flume events. If you need some other > protocol you would need to write a new variation of the FlumeManager that > sends the data however you want. In that case your server wouldn’t need to > know anything about Flume as all you would be doing was using Flume to > handle to event buffering. > > I really think writing your own CollectionAppender is a very bad idea. > Flume has already implemented it, it works, and isn’t trivial to build from > scratch. > > Ralph > > > On Sep 26, 2016, at 1:57 PM, Gary Gregory <garydgreg...@gmail.com> wrote: > > Please allow me to restate the use case I have for the CollectionAppender, > which is separate from any Flume-based or Syslog-based solution, use cases > I also have. Well, I have a Syslog use case, and whether or not Flume is in > the picture will really be a larger discussion in my organization due to > the requirement to run a Flume Agent.) > > A program (like a JDBC driver already using Log4j) communicates with > another (like a DBMS, not written in Java). The client and server > communicate over a proprietary socket protocol. The client sends a list of > buffers (in one go) to the server to perform one or more operations. One > kind of buffer this protocol defines is a log buffer (where each log event > is serialized in a non-Java format.) This allows each communication from > the client to the server to say "This is what's happened up to now". What > the server does with the log buffers is not important for this discussion. > > What is important to note is that the log buffer and other buffers go to > the server in one BLOB; which is why I cannot (in this use case) send log > events by themselves anywhere. > > I see that something (a CollectionAppender) must collect log events until > the client is ready to serialize them and send them to the server. Once the > events are drained out of the Appender (in one go by just getting the > collection), events can collect in a new collection. A synchronous drain > operation would create a new collection and return the old one. > > The question becomes: What kind of temporary location can the client use > to buffer log event until drain time? A Log4j Appender is a natural place > to collect log events since the driver uses Log4j. The driver will make its > business to drain the appender and work with the events at the right time. > I am thinking that the Log4j Appender part is generic enough for inclusion > in Log4j. > > Further thoughts? > > Thank you all for reading this far! > Gary > > On Sun, Sep 25, 2016 at 1:20 PM, Ralph Goers <ralph.go...@dslextreme.com> > wrote: > >> I guess I am not understanding your use case quite correctly. I am >> thinking you have a driver that is logging and you want those logs >> delivered to some other location to actually be written. If that is your >> use case then the driver needs a log4j2.xml that configures the >> FlumeAppender with either the memory or file channel (depending on your >> needs) and points to the server(s) that is/are to receive the events. The >> FlumeAppender handles sending them in batches with whatever size you want >> (but will send them in smaller amounts if they are in the channel too >> long). Of course you would need the log4j-flume and flume jars. So on the >> driver side you wouldn’t need to write anything, just configure the >> appender and make sure the jars are there. >> >> For the server that receives them you would also need Flume. Normally >> this would be a standalone component, but it really wouldn’t be hard to >> incorporate it into some other application. The only thing you would have >> to write would be the sink that writes the events to the database or >> whatever. To incorporate it into an application you would have to look at >> the main() method of flume and covert that to be a thread that you kick off. >> >> Ralph >> >> >> >> On Sep 25, 2016, at 12:01 PM, Gary Gregory <garydgreg...@gmail.com> >> wrote: >> >> Hi Ralph, >> >> Thanks for your feedback. Flume is great in the scenarios that do not >> involve sending a log buffer from the driver itself. >> >> I can't require a Flume Agent to be running 'on the side' for the use >> case where the driver chains a log buffer at the end of the train of >> database IO buffer. For completeness talking about this Flume scenario, if >> I read you right, I also would need to write a custom Flume sink, which >> would also be in memory, until the driver is ready to drain it. Or, I could >> query some other 'safe' and 'reliable' Flume sink that the driver could >> then drain of events when it needs to. >> >> Narrowing down on the use case where the driver chains a log buffer at >> the end of the train of database IO buffer, I'll think I have to see about >> converting the Log4j ListAppender into a more robust and flexible version. >> I think I'll call it a CollectionAppender and allow various Collection >> implementations to be plugged in. >> >> Gary >> >> Gary >> >> On Sat, Sep 24, 2016 at 3:44 PM, Ralph Goers <ralph.go...@dslextreme.com> >> wrote: >> >>> If you are buffering events in memory you run the risk of losing events >>> if something should fail. >>> >>> That said, if I had your requirements I would use the FlumeAppender. It >>> has either an in-memory option to buffer as you are suggesting or it can >>> write to a local file to prevent data loss if that is a requirement. It >>> already has the configuration options you are looking for and has been well >>> tested. The only downside is that you need to have either a Flume instance >>> receiving the messages are something that can receive Flume events over >>> Avro, but it is easier just to use Flume and write a custom sink to do what >>> you want with the data. >>> >>> Ralph >>> >>> On Sep 24, 2016, at 3:13 PM, Gary Gregory <garydgreg...@gmail.com> >>> wrote: >>> >>> Hi All, >>> >>> I can't believe it, but through a convoluted use-case, I actually need >>> an in-memory list appender, very much like our test-only ListAppender. >>> >>> The requirement is as follows. >>> >>> We have a JDBC driver and matching proprietary database that specializes >>> in data virtualization of mainframe resources like DB2, VSAM, IMS, and all >>> sorts of non-SQL data sources (http://www.rocketsoftware.com >>> /products/rocket-data/rocket-data-virtualization) >>> >>> The high level requirement is to merge the driver log into the server's >>> log for full-end to end tractability and debugging. >>> >>> When the driver is running on the z/OS mainframe, it can be configured >>> with a z/OS specific Appender that can talk to the server log module >>> directly. >>> >>> When the driver is running elsewhere, it can talk to the database via a >>> Syslog socket Appender. This requires more set up on the server side and >>> for the server to do special magic to know how the incoming log events >>> match up with server operations. Tricky. >>> >>> The customer should also be able to configure the driver such that >>> anytime the driver communicates to the database, it sends along whatever >>> log events have accumulated since the last client-server roundtrip. This >>> allows the server to match exactly the connection and operations the client >>> performed with the server's own logging. >>> >>> In order to do that I need to buffer all log events in an Appender and >>> when it's time, I need to get the list of events and reset the appender to >>> a new empty list so events can keep accumulating. >>> >>> My proposal is to either turn our ListAppender into such an appender. >>> For sanity, the appender could be configured with various sizing policies: >>> >>> - open: the list grows unbounded >>> - closed: the list grows to a given size and _new_ events are dropped on >>> the floor beyond that >>> - latest: the list grows to a given size and _old_ events are dropped on >>> the floor beyond that >>> >>> Thoughts? >>> >>> Gary >>> >>> -- >>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org >>> Java Persistence with Hibernate, Second Edition >>> <http://www.manning.com/bauer3/> >>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> >>> Spring Batch in Action <http://www.manning.com/templier/> >>> Blog: http://garygregory.wordpress.com >>> Home: http://garygregory.com/ >>> Tweet! http://twitter.com/GaryGregory >>> >>> >>> >> >> >> -- >> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org >> Java Persistence with Hibernate, Second Edition >> <http://www.manning.com/bauer3/> >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> >> Spring Batch in Action <http://www.manning.com/templier/> >> Blog: http://garygregory.wordpress.com >> Home: http://garygregory.com/ >> Tweet! http://twitter.com/GaryGregory >> >> >> > > > -- > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > Java Persistence with Hibernate, Second Edition > <http://www.manning.com/bauer3/> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > Spring Batch in Action <http://www.manning.com/templier/> > Blog: http://garygregory.wordpress.com > Home: http://garygregory.com/ > Tweet! http://twitter.com/GaryGregory > > > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition <http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory