Hi,

Thanks for starting this thread. Here is my feedback.

I somehow think the architecture is too complicated for wide adoption since
it requires to install the following.

HDFS.
HIVE.
IMPALA.
KAFKA.
SPARK (YARN).
YARN.
Zookeeper.

Currently there are way too many dependencies that discourages lot of users
from using it because they have to go through deployment of all that
required software. I think for wide option we should minimize the
dependencies and have more pluggable architecture. for example I am not
sure why HIVE & IMPALA both are required? why not just use Spark SQL since
its already dependency or say users may want to use their own distributed
query engine they like such as Apache Drill or something else. we should be
flexible enough to provide that option

Also, I see that HDFS is used such that collectors can receive file path's
through Kafka and be able to read a file. How big are these files ? Do we
really need HDFS for this? Why not provide more ways to send data such as
sending data directly through Kafka or say just leaving up to the user to
specify the file location as an argument to collector process

Finally, I learnt that to generate Net flow data one would require a
specific hardware. This really means Apache Spot is not meant for everyone.
I thought Apache Spot can be used to analyze the network traffic of any
machine but if it requires a specific hard then I think it is targeted for
specific group of people.

The real strength of Apache Spot should mainly be just analyzing network
traffic through ML.

Thanks!















On Thu, Apr 13, 2017 at 4:28 PM, Segerlind, Nathan L <
[email protected]> wrote:

> Thanks, Nate,
>
> Nate.
>
>
> -----Original Message-----
> From: Nate Smith [mailto:[email protected]]
> Sent: Thursday, April 13, 2017 4:26 PM
> To: [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: [Discuss] - Future plans for Spot-ingest
>
> I was really hoping it came through ok,
> Oh well :)
> Here’s an image form:
> http://imgur.com/a/DUDsD
>
>
> > On Apr 13, 2017, at 4:05 PM, Segerlind, Nathan L <
> [email protected]> wrote:
> >
> > The diagram became garbled in the text format.
> > Could you resend it as a pdf?
> >
> > Thanks,
> > Nate
> >
> > -----Original Message-----
> > From: Nathanael Smith [mailto:[email protected]]
> > Sent: Thursday, April 13, 2017 4:01 PM
> > To: [email protected]; [email protected];
> [email protected]
> > Subject: [Discuss] - Future plans for Spot-ingest
> >
> > How would you like to see Spot-ingest change?
> >
> > A. continue development on the Python Master/Worker with focus on
> performance / error handling / logging B. Develop Scala based ingest to be
> inline with code base from ingest, ml, to OA (UI to continue being
> ipython/JS) C. Python ingest Worker with Scala based Spark code for
> normalization and input into DB
> >
> > Including the high level diagram:
> > +-----------------------------------------------------------
> -------------------------------+
> > | +--------------------------+
> +-----------------+        |
> > | | Master                   |  A. B. C.                        |
> Worker          |        |
> > | |    A. Python             +---------------+      A.          |   A.
> Python     |        |
> > | |    B. Scala              |               |    +------------->
>          +----+   |
> > | |    C. Python             |               |    |             |
>          |    |   |
> > | +---^------+---------------+               |    |
>  +-----------------+    |   |
> > |     |      |                               |    |
>               |   |
> > |     |      |                               |    |
>               |   |
> > |     |     +Note--------------+             |    |
>  +-----------------+    |   |
> > |     |     |Running on a      |             |    |             | Spark
> Streaming |    |   |
> > |     |     |worker node in    |             |    |      B. C.  | B.
> Scala        |    |   |
> > |     |     |the Hadoop cluster|             |    |    +--------> C.
> Scala        +-+  |   |
> > |     |     +------------------+             |    |    |        |
>          | |  |   |
> > |   A.|                                      |    |    |
> +-----------------+ |  |   |
> > |   B.|                                      |    |    |
>             |  |   |
> > |   C.|                                      |    |    |
>             |  |   |
> > | +----------------------+          +-v------+----+----+-+
>  +--------------v--v-+ |
> > | |                      |          |                    |           |
>                  | |
> > | |   Local FS:          |          |    hdfs            |           |
> Hive / Impala    | |
> > | |  - Binary/Text       |          |                    |           |
>  - Parquet -     | |
> > | |    Log files -       |          |                    |           |
>                  | |
> > | |                      |          |                    |           |
>                  | |
> > | +----------------------+          +--------------------+
>  +-------------------+ |
> > +-----------------------------------------------------------
> -------------------------------+
> >
> > Please let me know your thoughts,
> >
> > - Nathanael
> >
> >
> >
>
>

Reply via email to