As much as I would love to convert a new data engineer to the ways of clojure, 
in my opinion, choosing a language to solve a problem is rarely a wise move. Do 
you have a team of engineers ready and willing to learn clojure or are you 
doing this yourself? We do a lot of work with all of the tools you mention (in 
clojure) but we built a lot of the frameworks ourselves or wrote wrappers 
around java tools. Not for the newbie... if your goal is to build this pipeline 
for your boss and you have any sort of deadline do yourself a favor and pick an 
existing, well documented, well googleable framework in a language that your 
team is familiar with. There are a ton of hurdles with everything you mentioned 
without even getting to clojure. You’re jumping in the deep end of the pool 
with no life jacket and you don’t know how to swim.

That said, if you ignore my advice you will learn a lot and we will be here to 
help, just be warned 😎

> On Jul 4, 2019, at 2:09 PM, Thad Guidry <thadgui...@gmail.com> wrote:
> 
> Christian writes really good tools.  Sparkling is no exception.
> I have yet to use it in production myself however, since I haven't had the 
> need to use Clojure directly to solve any "data aggregation" problems.  Spark 
> and other tools do that well enough, naturally.
> 
> As far as using a tool/programming language to solve "data integration" 
> problems in large enterprise environments, I will ALWAYS use Open Source 
> tools for that purpose.  Clojure is no exception.  But I do tend to choose 
> open source hammers to drive nails.  Sometimes Clojure is missing the handle 
> on its hammer, as we have all experienced, but that's on us since WE have the 
> power to make Clojure better.  But often TIME is what we lack to build better 
> API's, libraries, tools for Clojure expansion.
> 
> The Apache ecosystem offers many tools & libraries for "big data" and "data 
> integration"  which I often turn to first because I lack TIME for building 
> (long tail), but have enough TIME for learning new things (shorter tail that 
> helps the long tail).
> https://projects.apache.org/projects.html?category 
> 
> Thad
> https://www.linkedin.com/in/thadguidry/
> 
> 
>> On Thu, Jul 4, 2019 at 12:37 PM Chris Nuernberger <ch...@techascent.com> 
>> wrote:
>> Thad,
>> 
>> You approach seems very promising to me for a lot of jobs.  Spark runs on 
>> top of many things.
>> As far as a clojure layer on top, what do you think about sparkling?
>> 
>> 
>>> On Thu, Jul 4, 2019 at 8:43 AM Thad Guidry <thadgui...@gmail.com> wrote:
>>> "Batch" - doing things in chunks
>>> "Processing" - THE WORLD :-)  because it means so many different things to 
>>> so many folks (including your boss)
>>> 
>>> Without a doubt, you will love Apache Spark for your batch processing and 
>>> writing Spark Programs to conquer any World you are building.
>>> Spend time to install Spark standalone deploy and then use its powerful 
>>> Spark Shell (the feeling of Clojure REPL  !!)
>>> If you just want to jump in to a public cluster and Try Spark, then I would 
>>> suggest Databricks. 
>>> Spend time reading the features under Libraries drop-down menu on Apache 
>>> Spark website.
>>> 
>>> You might even be encouraged enough to write an official API in Clojure for 
>>> Apache Spark within a year!  (win-win)
>>> 
>>> One note of caution if you are building something for long term, you will 
>>> eventually have a need for data versioning, ACID transactions, schema 
>>> evolution, for this I use Delta Lake (not Datomic) since its fully 
>>> compatible with Spark
>>> 
>>> Best of luck!
>>> Thad
>>> https://www.linkedin.com/in/thadguidry/
>>> 
>>> 
>>>> On Thu, Jul 4, 2019 at 3:22 AM orazio <orazio.pist...@gmail.com> wrote:
>>>> Hi @atdixon and Thad, thanks for your help.
>>>> 
>>>> I provide more details about my project
>>>> My big data layer  is inspired by Lambda architecture. The pipeline 
>>>> include following layers and related tool choosed to address the issue:
>>>> - Nifi for data ingestion, and publisinh data/message on  kafka topic.
>>>> - Kafka as message broker that with kafka connect, allow me to store data 
>>>> in mongodb ( with mongodb sink and 1 day retention period ) and HDFS (hdfk 
>>>> sink with 1 year retention period)
>>>> - Real time processing with mongoDB using it's built-in QueryEngine taht 
>>>> provides extensive Querying, Filtering, and Searching abilities.
>>>> - Batch processing of data stored on HDFS, that performs data aggregation 
>>>> and store result on a HBase Table. ? The question is : Which tool do you 
>>>> suggest to use for data processing sotred on HDFS ?
>>>> - Serving Layer with HBase/Phoneix to store and allow access to batch view.
>>>> 
>>>> Now i'm invoking your help to choose the most appropriate tool to execute 
>>>> batch jobs (map reduce) which will have to aggregate data.
>>>> Natahn Marz suggests Clojure/Cascalog. Do you know other excellent 
>>>> clojure/Hadoop work in the community, about data processing?
>>>> if you know some particularly appropriate tools, I could also consider 
>>>> other work/library outside the clojure community.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> Il giorno mercoledì 3 luglio 2019 14:56:09 UTC+2, Thad Guidry ha scritto:
>>>>> 
>>>>> "The best code is never written"
>>>>> 
>>>>> https://zeppelin.apache.org/ 
>>>>> https://nifi.apache.org/  
>>>>>  
>>>>> Thad
>>>>> https://www.linkedin.com/in/thadguidry/
>>>>> 
>>>>> 
>>>>>> On Tue, Jul 2, 2019 at 11:07 AM orazio <orazio...@gmail.com> wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> I'm newbie on Clojure/Big Data, and i'm starting with hadoop.
>>>>>> I have installed Hortonworks HDP 3.1 
>>>>>> I have to design a Big Data Layer that ingests large iot datasets and 
>>>>>> social media datasets, process data with MapReduce job and produce 
>>>>>> aggregation to store on HBASE tables.
>>>>>> 
>>>>>> For now, my focus is addressed on data processing issue. My question is: 
>>>>>> Is Clojure a good choice for distributed data processing on hadoop ?
>>>>>> I found Cascalog as fully-featured data processing and querying library 
>>>>>> for Clojure or Java. But are there any active maintainers, for this 
>>>>>> library ? 
>>>>>> Do you know other excellent clojure/Hadoop work in the community, abaout 
>>>>>> data processing? 
>>>>>> 
>>>>>> I would appreciate some help.
>>>>>> 
>>>>>> Orazio
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>> Note that posts from new members are moderated - please be patient with 
>>>>>> your first post.
>>>>>> To unsubscribe from this group, send email to
>>>>>> clo...@googlegroups.com
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/clojure?hl=en
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Clojure" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to clo...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clojure@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with 
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+unsubscr...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to clojure+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to clojure+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/clojure/CADbpEJtRLqEpD5nzq5eUwUqXYtE7na87j043LqnqwdUaOWjfSA%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/CAChbWaNzPoCmYtK4iunpgazyLPFPn83rYzdVP-MQeZVsszr7fw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/F088909B-C3D0-4D39-8CEB-FE26A53DD1ED%40chartbeat.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to