How would you like to see Spot-ingest change?
A. continue development on the Python Master/Worker with focus on performance /
error handling / logging
B. Develop Scala based ingest to be inline with code base from ingest, ml, to
OA (UI to continue being ipython/JS)
C. Python ingest Worker with Scala based Spark code for normalization and input
into DB
Including the high level diagram:
+------------------------------------------------------------------------------------------+
| +--------------------------+
+-----------------+ |
| | Master | A. B. C. | Worker
| |
| | A. Python +---------------+ A. | A. Python
| |
| | B. Scala | | +------------->
+----+ |
| | C. Python | | | |
| | |
| +---^------+---------------+ | |
+-----------------+ | |
| | | | |
| |
| | | | |
| |
| | +Note--------------+ | |
+-----------------+ | |
| | |Running on a | | | | Spark
Streaming | | |
| | |worker node in | | | B. C. | B. Scala
| | |
| | |the Hadoop cluster| | | +--------> C. Scala
+-+ | |
| | +------------------+ | | | |
| | | |
| A.| | | |
+-----------------+ | | |
| B.| | | |
| | |
| C.| | | |
| | |
| +----------------------+ +-v------+----+----+-+
+--------------v--v-+ |
| | | | | |
| |
| | Local FS: | | hdfs | | Hive /
Impala | |
| | - Binary/Text | | | | -
Parquet - | |
| | Log files - | | | |
| |
| | | | | |
| |
| +----------------------+ +--------------------+
+-------------------+ |
+------------------------------------------------------------------------------------------+
Please let me know your thoughts,
- Nathanael