[hypertable-dev] Re: python bindings to hypertable

Masha Sun, 14 Feb 2010 01:31:37 -0800

git repo is here:
http://github.com/conferno/hypertable/tree/master/contrib/cc/PythonBinding/


I replaced Mateusz's files as they are outdated and cannot be compiled
with the current hypertable (see prev thread here
http://groups.google.com/group/hypertable-dev/browse_thread/thread/52d88cd9bed771c3
)

2010/2/14, Masha <[email protected]>:
> Hi
>
> Of course, your feedback will be appreciated!
>
> What I failed to do, and would like to ask the other developers: how
> to compile all the six .so info one big .so. ?
>
> Just adding -static into linker flags (extra_link_args =["-static"] in
> setup.py) did not help.
> Linking failed with the error:
>
> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o:
> relocation R_X86_64_32 against `__DTOR_END__' can not be used when
> making a shared object; recompile with -fPIC
> /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o: could not read
> symbols: Bad value
>
> That would simplify the deployment of mapreduce workers.
> Actually, six libraries is not enough: "apt-get install boost log4cpp
> libdb4 ..." must be executed on a worker machine in advance (I do the
> hypertable compilation against the versions from standard ubuntu
> repositories)
>
> With one big .so we can avoid version mismatches in case if the worker
> machine, for example, already have too new of too old boost.
> To be safe of the version hell, much more than 6 libraries must be deployed
> :(
>
>
> 2010/2/13, Mateusz Berezecki <[email protected]>:
>> Hi Masha
>>
>> This is awesome news. I'll check it out and prepare patches if you
>> don't mind.
>>
>> Thanks for a great job!
>>
>> And yes, thrift does not feel solid at all!;-)
>>
>> Mateusz
>>
>> On Feb 13, 2010, at 0:14, Masha <[email protected]> wrote:
>>
>>> Hello
>>>
>>> I have fixed the Python bindings to reflect the modern Hypertable and
>>> boost versions.
>>>
>>> Using the python bindings, 'select *' over a large dataset is about 20
>>> times faster than using the Thrift (I tested in on a single Linux-x64
>>> server, Thrift client eat CPU a lot).
>>>
>>>
>>> Also, The API is slightly improved:
>>> 1. TableScanner can act as an iterable object emitting Cell
>>>
>>> # how it was
>>>
>>> scanner = table.create_scanner(scan_spec)
>>> cell = ht.Cell()
>>> while scanner.next(cell):
>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value())
>>>
>>> # how it is
>>>
>>> for cell in table.create_scanner(scan_spec):
>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value)
>>>
>>> # or even simpler
>>>
>>> for cell in client.hql("select * from table"):
>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value)
>>>
>>> #--------------------------
>>>
>>> 2. client.hql("select ...") returns TableScanner
>>>   client.hql("show tables") returns python list, both of them are
>>> iterables
>>>
>>> 3. cell.value now is a getter, the parenthesis are not required.
>>>
>>> 4. Parameter of Client constructor is a path to 'hypertable.cfg', not
>>> the path to the installation directory.
>>>   Hypertable libraries deep inside use path to the executable as a
>>> starting point to find 'hypertable.cfg'.
>>>   It fails in case if the executable is '/usr/bin/python'.
>>>
>>>   As it is intended to be used on a client, it must work without full
>>> Hypertable installation, and must work with more than one hypertable
>>> server.
>>>
>>>   Required files are to copy from the full installation: 'ht.so'
>>> 'libHyperComm.so' 'libHyperCommon.so' 'libHyperTools.so'
>>> 'libHyperspace.so' 'libHypertable.so'
>>>   And, of course, 'hypertable.cfg'
>>>
>>> It is my first experience with boost:python and I'm not sure if it is
>>> correct to wrap pointers (TablePtr, TableMutatorPtr) instead of the
>>> the objects.
>>> So I suppose there could be some memory leaks, I have not investigated
>>> it yet.
>>> (I tried to wrap the objects  - Table, TableMutator, TableScanner -
>>> but then I do not know how to return either TableScanner or list from
>>> client.hql(), with the pointers it is easy, so I get back to use
>>> them).
>>>
>>> Compiling of the python bindings does not depend on hypertable
>>> compilation process and can be done independently later.
>>> Just run 'python setup.py build'.
>>> But note that hypertable libraries must be compiled with -
>>> DBUILD_SHARED_LIBS=ON (precompiled binaries from hypertable.org do
>>> not).
>>>
>>> I put the code here for a while (sorry, I do not know how to use
>>> git):
>>> http://code.google.com/p/python-hypertable/source/browse/trunk/
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

[hypertable-dev] Re: python bindings to hypertable

Reply via email to