Sorry to have missed the distribution list (This is a habit I need to
quit).

Please feel free to respond to this thread if the doc isn't accessible.

Cheers,
Arun

---------- Forwarded message ---------
From: Arun Manivannan <a...@arunma.com>
Date: Thu, Nov 15, 2018 at 2:47 PM
Subject: Implementation approach - Amaterasu 44 and 52
To: ya...@apache.org <ya...@apache.org>, Eyal Ben-Ivri <e...@shinto.io>


Hi Yaniv and Eyal,

As we discussed yesterday, for Amaterasu 44 and 52, I have written some
rough implementation notes. I believe this sounds reasonable. Please feel
free to add in more info at the bottom of this doc.
<https://docs.google.com/document/d/1t5SBp4w4ypC0xtdH9Z90g3AYEO1xUYyqz6PWe0KPP1s/edit?ts=5bebae0e>
(and
let me know please).

I think we have all the JIRAs required and I don't there's a need for
another one.

Implementation Approach

Amaterasu 52

-------------------

   1.

   Create a getDatasetConfiguration on ConfigManager.kt (Testcases to be
   added)
   2.

   Create possible config implementations for the first cut - Hive and File
   (Note that each of these config must be supported with implementations for
   Spark to start off)

Amaterasu 44

-------------------

   1.

   Draw out a frameworks-common that has the parent for the implementations
   - FilePersistedDataset, HivePersistedDataset etc (Write testcases. Could be
   easily tested)
   2.

   Implement the FilePersistedDatasetImpl, HivePersistedDatasetImpl for
   Spark
   3.

   Figure out a way to bridge Datasetconfig to DatasetImpl - Let it be a
   simple “when” and have it reviewed.
   4.

   The final piece being Amacontext.persist - we retrieve the corresponding
   dataset config, use the bridge at step 5 and invoke the corresponding
   implementation.


As a side note, need look into how Yarn/other runtime runs could be
tested.  This is always a trouble when the need to prepare an entire
cluster and confirm even mid-size changes.


Cheers,
Arun

Reply via email to