On Mon, Jun 9, 2014 at 1:51 AM, Mikhail Khludnev <[email protected] > wrote:
> - joins/caching - seem possible with Morphlines but still there is no such > command > - delta import - scenario we don't need to forget to handle it > - threads (it's completely out Morphline's concerns) > - distributed processing - it would be great if we can partition > datasource Here are few things to followup. The gap is that Morphline is build to be invoked at map stage of Hadoop, hence it's really slim itself and relies on core H's features. Thus, we either need to build such harness yourselves, reuse old DIH ones, or check Flume (tbc). So, TODO list also includes: - web IDE for editing DSL; - long running task tracking/status check and heartbeat with REST access; - let's think one step forward - consider threads. It suppose, the most efficient and safe idea is to: partition datasource, spawn few threads with own Morphline pipe in it. Then, it's better to call SolrServer concurrently via http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/loadSolr -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <[email protected]>
