Sorry to have missed the distribution list (This is a habit I need to quit).
Please feel free to respond to this thread if the doc isn't accessible. Cheers, Arun ---------- Forwarded message --------- From: Arun Manivannan <a...@arunma.com> Date: Thu, Nov 15, 2018 at 2:47 PM Subject: Implementation approach - Amaterasu 44 and 52 To: ya...@apache.org <ya...@apache.org>, Eyal Ben-Ivri <e...@shinto.io> Hi Yaniv and Eyal, As we discussed yesterday, for Amaterasu 44 and 52, I have written some rough implementation notes. I believe this sounds reasonable. Please feel free to add in more info at the bottom of this doc. <https://docs.google.com/document/d/1t5SBp4w4ypC0xtdH9Z90g3AYEO1xUYyqz6PWe0KPP1s/edit?ts=5bebae0e> (and let me know please). I think we have all the JIRAs required and I don't there's a need for another one. Implementation Approach Amaterasu 52 ------------------- 1. Create a getDatasetConfiguration on ConfigManager.kt (Testcases to be added) 2. Create possible config implementations for the first cut - Hive and File (Note that each of these config must be supported with implementations for Spark to start off) Amaterasu 44 ------------------- 1. Draw out a frameworks-common that has the parent for the implementations - FilePersistedDataset, HivePersistedDataset etc (Write testcases. Could be easily tested) 2. Implement the FilePersistedDatasetImpl, HivePersistedDatasetImpl for Spark 3. Figure out a way to bridge Datasetconfig to DatasetImpl - Let it be a simple “when” and have it reviewed. 4. The final piece being Amacontext.persist - we retrieve the corresponding dataset config, use the bridge at step 5 and invoke the corresponding implementation. As a side note, need look into how Yarn/other runtime runs could be tested. This is always a trouble when the need to prepare an entire cluster and confirm even mid-size changes. Cheers, Arun