On Wed, Apr 7, 2010 at 20:32, Andrzej Bialecki <a...@getopt.org> wrote:
> On 2010-04-07 18:54, Doğacan Güney wrote:
>> Hey everyone,
>>
>> On Tue, Apr 6, 2010 at 20:23, Andrzej Bialecki <a...@getopt.org> wrote:
>>> On 2010-04-06 15:43, Julien Nioche wrote:
>>>> Hi guys,
>>>>
>>>> I gather that we'll jump straight to  2.0 after 1.1 and that 2.0 will be
>>>> based on what is currently referred to as NutchBase. Shall we create a
>>>> branch for 2.0 in the Nutch SVN repository and have a label accordingly for
>>>> JIRA so that we can file issues / feature requests on 2.0? Do you think 
>>>> that
>>>> the current NutchBase could be used as a basis for the 2.0 branch?
>>>
>>> I'm not sure what is the status of the nutchbase - it's missed a lot of
>>> fixes and changes in trunk since it's been last touched ...
>>>
>>
>> I know... But I still intend to finish it, I just need to schedule
>> some time for it.
>>
>> My vote would be to go with nutchbase.
>
> Hmm .. this puzzles me, do you think we should port changes from 1.1 to
> nutchbase? I thought we should do it the other way around, i.e. merge
> nutchbase bits to trunk.
>

Hmm, I am a bit out of touch with the latest changes but I know that
the differences
between trunk and nutchbase are unfortunately rather large right now.
If merging nutchbase
back into trunk would be easier then sure, let's do that.

>
>>>> * support for HBase : via ORM or not (see
>>>> NUTCH-808<https://issues.apache.org/jira/browse/NUTCH-808>
>>>> )
>>>
>>> This IMHO is promising, this could open the doors to small-to-medium
>>> installations that are currently too cumbersome to handle.
>>>
>>
>> Yeah, there is already a simple ORM within nutchbase that is
>> avro-based and should
>> be generic enough to also support MySQL, cassandra and berkeleydb. But
>> any good ORM will
>> be a very good addition.
>
> Again, the advantage of DataNucleus is that we don't have to handcraft
> all the mid- to low-level mappings, just the mid-level ones (JOQL or
> whatever) - the cost of maintenance is lower, and the number of backends
> that are supported out of the box is larger. Of course, this is just
> IMHO - we won't know for sure until we try to use both your custom ORM
> and DataNucleus...

I am obviously a bit biased here but I have no strong feelings really.
DataNucleus
is an excellent project. What I like about avro-based approach is the
essentially free
MapReduce support we get and the fact that supporting another language
is easy. So,
we can expose partial hbase data through a server and a python-client
can easily read/write to it, thanks
to avro. That being said, I am all for DataNucleus or something else.

>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



-- 
Doğacan Güney

Reply via email to