[ https://issues.apache.org/jira/browse/HADOOP-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enis Soztutar updated HADOOP-2536: ---------------------------------- Attachment: mapred_jdbc_v3.patch Since Fredrik said that he cannot continue to work on the patch, I have updated it with some changes. The changes include : # package and class names have DB prefix instead of database. # DBInputSplit is now an inner class of DBInputFormat # instead of the type mapping to convert the data types in the library, a new DBWritable interface is introduced. The classes implement DBWritable to convert from/to db tuples. # DBRecordReader emits <LongWritable, T> types where record number is the key and T is of type DBWritable. # DBRecordWriter accepts <K, V> where K implements DBWritable(hence written to db) and V is discarded. # JDBC uses JDBC batch update. # introduced two ways of setting the input query. # improved documentation. # added a sample mapred program reading data from db and writing the results back to db. The program calculates the number of pageviews in a syntactically generated access log. The example program uses HSQLDB as an embedded database. # added a test case running the example job in the MiniCluster. > MapReduce for MySQL > ------------------- > > Key: HADOOP-2536 > URL: https://issues.apache.org/jira/browse/HADOOP-2536 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Reporter: Fredrik Hedberg > Assignee: Fredrik Hedberg > Priority: Minor > Attachments: database-2.diff, database.diff, mapred_jdbc_v3.patch > > > Add support for running MapReduce jobs over data residing in a MySQL table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.