[
https://issues.apache.org/jira/browse/SAMZA-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Kleppmann updated SAMZA-200:
-----------------------------------
Assignee: (was: Martin Kleppmann)
> Explore using MySQL changelog as input stream
> ---------------------------------------------
>
> Key: SAMZA-200
> URL: https://issues.apache.org/jira/browse/SAMZA-200
> Project: Samza
> Issue Type: New Feature
> Reporter: Martin Kleppmann
>
> Samza is designed with good support for database changelogs, but the current
> open source release is mostly centered around Kafka. It would be good to have
> out-of-the-box support for some common databases, such as MySQL, as well.
> [Databus|http://www.socc2012.org/s18-das.pdf?attredirects=0] is LinkedIn's
> change capture tool, but the current open source release focuses mainly on
> Oracle. There is an open source release of [Databus for
> MySQL|https://github.com/linkedin/databus/wiki/Databus-for-MySQL], but it's a
> proof-of-concept implementation, not the one used by LinkedIn in production.
> (The one used by LinkedIn requires a patched version of MySQL.) The open
> source Databus uses [Open
> Replicator|https://code.google.com/p/open-replicator/] to connect to a MySQL
> server as a slave, and parses the binlog to find any inserts, updates or
> deletes.
> I played around a bit with Open Replicator today, and got it working — a
> small Scala program that could get a real-time feed of all changes happening
> in a MySQL database. However, I have some doubts about the quality of the
> library (the code is not very good, it has only very cursory tests, the
> original maintainer hasn't touched it for 18 months, and there are reports of
> nasty bugs -- eg. blowing up on any negative number). There don't seem to be
> any better Java binlog parsers out there. But I did skim the source of Open
> Replicator, and it's not too complicated -- it seems quite feasible to write
> a MySQL binlog parser ourselves.
> This is still very much at exploratory stage, but I think it could be really
> cool to have database changelog support easily available in Samza.
--
This message was sent by Atlassian JIRA
(v6.2#6252)