[Pytables-users] pytables 2.2 build error on amd64
Hi, I tried to build pytables 2.2 on an x86-64bit (amd64) debian machine. but received the following build errors (below) I then successfully built 2.1.2 with no issues (and passes all tests) Jeff * Found numpy 1.5.0b1 package installed. * Found numexpr 1.4 package installed. * Found HDF5 headers at ``/usr/include``, library at ``/usr/lib64``. * Found LZO 2 headers at ``/usr/include``, library at ``/usr/lib64``. * Skipping detection of LZO 1 since LZO 2 has already been found. * Could not find bzip2 headers and library; disabling support for it. * Found pthreads headers at ``/usr/include``, library at ``/usr/lib64``. running build_ext cythoning tables/hdf5Extension.pyx to tables/hdf5Extension.c Error converting Pyrex file to C: ... self.disk_type_id = AtomToHDF5Type(atom, self.byteorder) # Allocate space for the dimension axis info and fill it dims = numpy.array(shape, dtype=numpy.intp) self.rank = len(shape) self.dims = npy_malloc_dims(self.rank, npy_intp *(dims.data)) ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:836:31: Cannot convert 'definitions.hsize_t *' to Python object Error converting Pyrex file to C: ... # Save the array complib = PyString_AsString(self.filters.complib or '') version = PyString_AsString(self._v_version) class_ = PyString_AsString(self._c_classId) self.dataset_id = H5ARRAYmake(self.parent_id, self.name, version, self.rank, self.dims, ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:844:49: Cannot convert Python object to 'definitions.hsize_t *' Error converting Pyrex file to C: ... atom = self.atom itemsize = atom.itemsize self.disk_type_id = AtomToHDF5Type(atom, self.byteorder) self.rank = len(self.shape) self.dims = malloc_dims(self.shape) ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:882:27: Cannot convert 'definitions.hsize_t *' to Python object Error converting Pyrex file to C: ... self.disk_type_id = AtomToHDF5Type(atom, self.byteorder) self.rank = len(self.shape) self.dims = malloc_dims(self.shape) if self.chunkshape: self.dims_chunk = malloc_dims(self.chunkshape) ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:884:35: Cannot convert 'definitions.hsize_t *' to Python object Error converting Pyrex file to C: ... atom.dflt = dflts # Create the CArray/EArray self.dataset_id = H5ARRAYmake( self.parent_id, self.name, version, self.rank, self.dims, self.extdim, self.disk_type_id, self.dims_chunk, ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:907:10: Cannot convert Python object to 'definitions.hsize_t *' Error converting Pyrex file to C: ... atom.dflt = dflts # Create the CArray/EArray self.dataset_id = H5ARRAYmake( self.parent_id, self.name, version, self.rank, self.dims, self.extdim, self.disk_type_id, self.dims_chunk, ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:907:53: Cannot convert Python object to 'definitions.hsize_t *' Error converting Pyrex file to C: ... self.disk_type_id, self.type_id = self._get_type_ids() # Get the atom for this type atom = AtomFromHDF5Type(self.disk_type_id) # Get the rank for this array object if H5ARRAYget_ndims(self.dataset_id, self.rank) 0: ^ /data/arb/distros/tables-2.2/tables/hdf5Extension.pyx:954:41: Cannot take address of Python variable Error converting Pyrex file to C: ... # Get the rank for this array object if H5ARRAYget_ndims(self.dataset_id, self.rank) 0: raise HDF5ExtError(Problems getting ndims!) # Allocate space for the dimension axis info self.dims = hsize_t *malloc(self.rank * sizeof(hsize_t)) ^
[Pytables-users] pytables 2.3.1 indexing issue
Hi, using the configuration: pytables 2.3.1 numexpr 1.4.1 python 2.7.1 (on adm64 debian and win64) using a readWhere to select rows from a table, if I give a selector with multiple operands on the index column (e.g. (column value) ( column value2)) doesn't seem to work (though works fine with a single operand and on a non-indexed table) in the test output the 4th case (index with start and stop operands), I don't receive any selection (contrast with the 2nd case which shows non-indexed behavior) is this behavior expected? thanks, Jeff --- test script --- #!/usr/local/bin/python import tables import numpy as np import datetime, time # create table def create(name, add_index = False): test_file = %s.hdf % name handle = tables.openFile(test_file, w) table = handle.createTable(handle.root, 'table', dict( index = tables.Time64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data date = datetime.datetime(2011,1,1,8,0,0) r = table.row for i in xrange(100): r['index'] = time.mktime((date + datetime.timedelta(days=i)).timetuple()) r['column'] = (str-%d % (i % 5)) r['values'] = np.arange(3) r.append() table.flush() if add_index: col = table.cols._f_col('index') col.createIndex(filters = None) handle.close() return test_file def select(name, start, stop = None): start and stop are dates test_file = %s.hdf % name handle = tables.openFile(test_file,r) selectors = [] # index selector selectors.append((index = %s) % time.mktime(start.timetuple())) if stop is not None: selectors.append((index = %s) % time.mktime(stop.timetuple())) # column selector selectors.append(((column == 'str-0') | (column == 'str-1'))) selector = ' '.join(selectors) print selector - [f-%s,start-%s,stop-%s] -- %s % (test_file,start,stop,selector) ans = getattr(handle.root,'table').readWhere(selector) print ans - %s % ans handle.close() # no indexing create('no_index',add_index = False) select('no_index', start = datetime.datetime(2011,2,1,0,0,0)) select('no_index', start = datetime.datetime(2011,2,1,0,0,0), stop = datetime.datetime(2011,3,1,0,0,0)) # with indexing create('with_index',add_index = True) select('with_index', start = datetime.datetime(2011,2,1,0,0,0)) select('with_index', start = datetime.datetime(2011,2,1,0,0,0), stop = datetime.datetime(2011,3,1,0,0,0)) --ptdump -v on the no_index.hdf [cow-jreback-/tmp] ptdump -v no_index.hdf / (RootGroup) '' /table (Table(100,)) '' description := { column: StringCol(itemsize=25, shape=(), dflt='', pos=0), index: Time64Col(shape=(), dflt=0.0, pos=1), values: Float64Col(shape=(3,), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (1149,) -- ptdump -v on the with_index.hdf [cow-jreback-/tmp] ptdump -v with_index.hdf / (RootGroup) '' /table (Table(100,)) '' description := { column: StringCol(itemsize=25, shape=(), dflt='', pos=0), index: Time64Col(shape=(), dflt=0.0, pos=1), values: Float64Col(shape=(3,), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (1149,) autoIndex := True colindexes := { index: Index(6, medium, shuffle, zlib(1)).is_CSI=False} test output --- selector - [f-no_index.hdf,start-2011-02-01 00:00:00,stop-None] -- (index = 1296536400.0) ((column == 'str-0') | (column == 'str-1')) ans - [('str-1', 1296565200.0, [0.0, 1.0, 2.0]) ('str-0', 1296910800.0, [0.0, 1.0, 2.0]) ('str-1', 1296997200.0, [0.0, 1.0, 2.0]) ('str-0', 1297342800.0, [0.0, 1.0, 2.0]) ('str-1', 1297429200.0, [0.0, 1.0, 2.0]) ('str-0', 1297774800.0, [0.0, 1.0, 2.0]) ('str-1', 1297861200.0, [0.0, 1.0, 2.0]) ('str-0', 1298206800.0, [0.0, 1.0, 2.0]) ('str-1', 1298293200.0, [0.0, 1.0, 2.0]) ('str-0', 1298638800.0, [0.0, 1.0, 2.0]) ('str-1', 1298725200.0, [0.0, 1.0, 2.0]) ('str-0', 1299070800.0, [0.0, 1.0, 2.0]) ('str-1', 1299157200.0, [0.0, 1.0, 2.0]) ('str-0', 1299502800.0, [0.0, 1.0, 2.0]) ('str-1', 1299589200.0, [0.0, 1.0, 2.0]) ('str-0', 1299934800.0, [0.0, 1.0, 2.0]) ('str-1', 1300017600.0, [0.0, 1.0, 2.0]) ('str-0', 1300363200.0, [0.0, 1.0, 2.0]) ('str-1', 1300449600.0, [0.0, 1.0, 2.0]) ('str-0', 1300795200.0, [0.0, 1.0, 2.0]) ('str-1', 1300881600.0, [0.0, 1.0, 2.0]) ('str-0', 1301227200.0, [0.0, 1.0, 2.0]) ('str-1', 1301313600.0, [0.0, 1.0, 2.0]) ('str-0', 1301659200.0, [0.0, 1.0, 2.0]) ('str-1', 1301745600.0, [0.0, 1.0, 2.0]) ('str-0', 1302091200.0, [0.0, 1.0, 2.0]) ('str-1', 1302177600.0, [0.0, 1.0, 2.0])] selector - [f-no_index.hdf,start-2011-02-01 00:00:00,stop-2011-03-01 00:00:00] -- (index = 1296536400.0) (index = 1298955600.0) ((column == 'str-0') | (column == 'str-1')) ans - [('str-1', 1296565200.0, [0.0, 1.0, 2.0]) ('str-0', 1296910800.0, [0.0, 1.0, 2.0]) ('str-1', 1296997200.0, [0.0, 1.0, 2.0])
Re: [Pytables-users] variable length strings in tables?
thanks created https://github.com/PyTables/PyTables/issues/198 I can be reached on my cell (917)971-6387 From: Anthony Scopatz scop...@gmail.com To: Jeff Reback j...@reback.net; Discussion list for PyTables pytables-users@lists.sourceforge.net Sent: Monday, December 3, 2012 11:15 AM Subject: Re: [Pytables-users] variable length strings in tables? On Sun, Dec 2, 2012 at 2:49 PM, Jeff Reback jreb...@yahoo.com wrote: Hi, Pandas uses pytables as a storage backend and has worked out quite well fyi ... http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables I have a particular use case where I build a table, then later append to it. Fixed types are no problem. However, I often index these tables by StringCols, which I pre-allocated to the largest size I think that i'll need. So, wanted to think about supporting variable-length string columns in the table. any thoughts on these strategies: 1) any way to directly support a variable-length string in a particular column? (e.g. VLStringCol doesn't exist but a stand-alone VLStringAtom does) This is possible as the underlying HDF5 library will support it. However, no one has had the time to write it. Please open an issue (or possibly a pull request related to this.) 2) As an alternative, I could store along with the table a VLArray the same # of rows as the table and keep string data here -- of course have to keep the synchronization up to date (and this doesn't help with an 'indexing' column, just with 'data' columns) This is what I do in PyTables and HDF5 itself. It works out quite well for me. This has the advantage that the VLString data get compressed separately from the numeric data (if using compression). Yes, it is one more thing to manage, but the file sizes I are much significantly smaller. Be Well Anthony thanks, Jeff -- Keep yourself connected to Go Parallel: DESIGN Expert tips on starting your parallel project right. http://goparallel.sourceforge.net/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Keep yourself connected to Go Parallel: BUILD Helping you discover the best ways to construct your parallel projects. http://goparallel.sourceforge.net___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] readWhere, number of selectors issue
It seems there is a limit to the condition sytax when using readWhere I get various exceptions when passing increasing number of terms is this some kind of hard coded limit? is there a way to pre-compile this and test for it? (e.g. when I am actually creating the condition) - my alternative is simple to drop that part of the condition and filter out after thanks, Jeff ans - [n-2 ,len_selector-58 ] -- (399,) ans - [n-10 ,len_selector-234 ] -- (999,) ans - [n-100 ,len_selector-2304 ] -- (999,) ans - [n-200 ,len_selector-4704 ] -- (999,) ans - [n-254 ,len_selector-6000 ] -- chr() arg not in range(256) ans - [n-255 ,len_selector-6024 ] -- chr() arg not in range(256) ans - [n-300 ,len_selector-7104 ] -- chr() arg not in range(256) ans - [n-400 ,len_selector-9504 ] -- maximum recursion depth exceeded while calling a Python object ans - [n-500 ,len_selector-11904 ] -- maximum recursion depth exceeded while calling a Python object script to reproduce #!/usr/local/bin/python import tables import numpy as np import datetime, time test_file = 'test_select.h5' handle = tables.openFile(test_file, w) node = handle.createGroup(handle.root, 'test') table = handle.createTable(node, 'table', dict( index = tables.Int64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data r = table.row for i in xrange(1000): r['index'] = i r['column'] = (str-%d % (i % 5)) r['values'] = np.arange(3) r.append() table.flush() handle.close() def read_for(n): handle = tables.openFile(test_file,r) selector = (index = 1) %s % '(' + ' | '.join([ (column == 'str-%s') % v for v in range(n) ]) + ')' #print selector - [%s] -- %s % (n,selector) try: ans = handle.root.test.table.readWhere(selector) print ans - [n-%-20.20s,len_selector-%-20.20s] -- %s % (n,len(selector),ans.shape) except (Exception), detail: print ans - [n-%-20.20s,len_selector-%-20.20s] -- %s % (n,len(selector),str(detail)) handle.close() for n in [ 2, 10, 100, 200, 254, 255, 300, 400, 500 ]: read_for(n) -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] pytable 3 - with encoding
anthony, I can be reached on my cell (917)971-6387-- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] pytable 30 - encoding
anthony, where am I going wrong here? #!/usr/local/bin/python3 import tables import numpy as np import datetime, time encoding = 'UTF-8' test_file = 'test_select.h5' handle = tables.openFile(test_file, w) node = handle.createGroup(handle.root, 'test') table = handle.createTable(node, 'table', dict( index = tables.Int64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data r = table.row for i in range(10): r['index'] = i r['column'] = (str-%d % (i % 5)).encode(encoding) r['values'] = np.arange(3) r.append() table.flush() handle.close() # read handle = tables.openFile(test_file,r) result = handle.root.test.table.read() print(table data\n) print(result) # where print(\nselector\n) selector = (column == 'str-2').encode(encoding) print(selector) result = handle.root.test.table.readWhere(selector) print(result) and the following out: [sheep-jreback-/code/arb/test] python3 pytables-3.py table data [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0]) (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0]) (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0]) (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0]) (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])] selector b(column == 'str-2') Traceback (most recent call last): File pytables-3.py, line 37, in module result = handle.root.test.table.readWhere(selector) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py, line 35, in oldfunc return obj(*args, **kwargs) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1522, in read_where self._where(condition, condvars, start, stop, step)] File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1484, in _where compiled = self._compile_condition(condition, condvars) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1358, in _compile_condition compiled = compile_condition(condition, typemap, indexedcols) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py, line 419, in compile_condition func = NumExpr(expr, signature) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 559, in NumExpr precompile(ex, signature, context) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 511, in precompile constants_order, constants = getConstants(ast) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in getConstants for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in listcomp for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 284, in convertConstantToKind return kind_to_type[kind](x) TypeError: string argument without an encoding Closing remaining open files: test_select.h5... done -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] pytable 30 - encoding
Anthony, I am using numexpr 2.1 (latest) this is puzzling; doesn't matter what I pass (bytes or str) , same result? (column == 'str-2') /mnt/code/arb/test/pytables-3.py(38)module() - result = handle.root.test.table.readWhere(selector) (Pdb) handle.root.test.table.readWhere(selector) *** TypeError: string argument without an encoding (Pdb) handle.root.test.table.readWhere(selector.encode(encoding)) *** TypeError: string argument without an encoding (Pdb) From: Anthony Scopatz scop...@gmail.com To: Jeff Reback j...@reback.net; Discussion list for PyTables pytables-users@lists.sourceforge.net Sent: Tuesday, June 4, 2013 12:25 PM Subject: Re: [Pytables-users] pytable 30 - encoding Hi Jeff, Have you also updated numexpr to the most recent version? The error is coming from numexpr not compiling the expression correctly. Also, you might try making selector a str, rather than bytes: selector = (column == 'str-2') rather than selector = (column == 'str-2').encode(encoding) Be Well Anthony On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback jreb...@yahoo.com wrote: anthony,where am I going wrong here? #!/usr/local/bin/python3 import tables import numpy as np import datetime, time encoding = 'UTF-8' test_file = 'test_select.h5' handle = tables.openFile(test_file, w) node = handle.createGroup(handle.root, 'test') table = handle.createTable(node, 'table', dict( index = tables.Int64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data r = table.row for i in range(10): r['index'] = i r['column'] = (str-%d % (i % 5)).encode(encoding) r['values'] = np.arange(3) r.append() table.flush() handle.close() # read handle = tables.openFile(test_file,r) result = handle.root.test.table.read() print(table data\n) print(result) # where print(\nselector\n) selector = (column == 'str-2').encode(encoding) print(selector) result = handle.root.test.table.readWhere(selector) print(result) and the following out: [sheep-jreback-/code/arb/test] python3 pytables-3.py table data [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0]) (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0]) (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0]) (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0]) (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])] selector b(column == 'str-2') Traceback (most recent call last): File pytables-3.py, line 37, in module result = handle.root.test.table.readWhere(selector) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py, line 35, in oldfunc return obj(*args, **kwargs) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1522, in read_where self._where(condition, condvars, start, stop, step)] File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1484, in _where compiled = self._compile_condition(condition, condvars) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1358, in _compile_condition compiled = compile_condition(condition, typemap, indexedcols) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py, line 419, in compile_condition func = NumExpr(expr, signature) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 559, in NumExpr precompile(ex, signature, context) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 511, in precompile constants_order, constants = getConstants(ast) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in getConstants for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in listcomp for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 284, in convertConstantToKind return kind_to_type[kind](x) TypeError: string argument without an encoding Closing remaining open files: test_select.h5... done -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- How ServiceNow helps IT people transform
Re: [Pytables-users] pytable 30 - encoding
Anthony, I created an issue with more info I am not sure if this is a bug, or just a way both ne/pytables treat strings that need to touch an encoded value; I found workaround by specifying the condvars to readWhere. Any more thoughts on this? thanks Jeff https://github.com/PyTables/PyTables/issues/265 I can be reached on my cell (917)971-6387 From: Anthony Scopatz scop...@gmail.com To: Jeff Reback j...@reback.net Cc: Discussion list for PyTables pytables-users@lists.sourceforge.net Sent: Tuesday, June 4, 2013 6:39 PM Subject: Re: [Pytables-users] pytable 30 - encoding Hi Jeff, Hmmm, Could you try doing the same thing on just an in-memory numpy array using numexpr. If this succeeds it tells us that the problem is in PyTables, not numexpr. Be Well Anthony On Tue, Jun 4, 2013 at 11:35 AM, Jeff Reback jreb...@yahoo.com wrote: Anthony, I am using numexpr 2.1 (latest) this is puzzling; doesn't matter what I pass (bytes or str) , same result? (column == 'str-2') /mnt/code/arb/test/pytables-3.py(38)module() - result = handle.root.test.table.readWhere(selector) (Pdb) handle.root.test.table.readWhere(selector) *** TypeError: string argument without an encoding (Pdb) handle.root.test.table.readWhere(selector.encode(encoding)) *** TypeError: string argument without an encoding (Pdb) From: Anthony Scopatz scop...@gmail.com To: Jeff Reback j...@reback.net; Discussion list for PyTables pytables-users@lists.sourceforge.net Sent: Tuesday, June 4, 2013 12:25 PM Subject: Re: [Pytables-users] pytable 30 - encoding Hi Jeff, Have you also updated numexpr to the most recent version? The error is coming from numexpr not compiling the expression correctly. Also, you might try making selector a str, rather than bytes: selector = (column == 'str-2') rather than selector = (column == 'str-2').encode(encoding) Be Well Anthony On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback jreb...@yahoo.com wrote: anthony,where am I going wrong here? #!/usr/local/bin/python3 import tables import numpy as np import datetime, time encoding = 'UTF-8' test_file = 'test_select.h5' handle = tables.openFile(test_file, w) node = handle.createGroup(handle.root, 'test') table = handle.createTable(node, 'table', dict( index = tables.Int64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data r = table.row for i in range(10): r['index'] = i r['column'] = (str-%d % (i % 5)).encode(encoding) r['values'] = np.arange(3) r.append() table.flush() handle.close() # read handle = tables.openFile(test_file,r) result = handle.root.test.table.read() print(table data\n) print(result) # where print(\nselector\n) selector = (column == 'str-2').encode(encoding) print(selector) result = handle.root.test.table.readWhere(selector) print(result) and the following out: [sheep-jreback-/code/arb/test] python3 pytables-3.py table data [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0]) (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0]) (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0]) (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0]) (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])] selector b(column == 'str-2') Traceback (most recent call last): File pytables-3.py, line 37, in module result = handle.root.test.table.readWhere(selector) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py, line 35, in oldfunc return obj(*args, **kwargs) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1522, in read_where self._where(condition, condvars, start, stop, step)] File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1484, in _where compiled = self._compile_condition(condition, condvars) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py, line 1358, in _compile_condition compiled = compile_condition(condition, typemap, indexedcols) File /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py, line 419, in compile_condition func = NumExpr(expr, signature) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 559, in NumExpr precompile(ex, signature, context) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 511, in precompile constants_order, constants = getConstants(ast) File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in getConstants for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py, line 294, in listcomp for a in constants_order] File /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3
Re: [Pytables-users] dates and space
Here is a pandas solution for doing just this (which uses PyTables under the hood): # create a frame In [45]: df = DataFrame(randn(1000,2),index=date_range('2101',periods=1000)) In [53]: df Out[53]: class 'pandas.core.frame.DataFrame' DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00 Freq: D Data columns (total 2 columns): 0 1000 non-null values 1 1000 non-null values dtypes: float64(2) # store it as a table In [46]: store = pd.HDFStore('test.h5',mode='w') In [47]: store.append('df',df) # select out the index (a datetimeindex in this case) In [48]: c = store.select_column('df','index') # get the coordinates of matching index In [49]: coords = c[pd.DatetimeIndex(c).month==5] # select those rows In [51]: from pandas.io.pytables import Coordinates In [50]: store.select('df',where=Coordinates(coords.index,None,None)) Out[50]: class 'pandas.core.frame.DataFrame' DatetimeIndex: 93 entries, 2000-05-01 00:00:00 to 2002-05-31 00:00:00 Data columns (total 2 columns): 0 93 non-null values 1 93 non-null values dtypes: float64(2) From: Anthony Scopatz scop...@gmail.com To: Discussion list for PyTables pytables-users@lists.sourceforge.net Sent: Monday, August 5, 2013 2:54 PM Subject: Re: [Pytables-users] dates and space On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy guziy.sa...@gmail.com wrote: Hi Pytables users and developers: I have a few questions to which I could not find the answer in the documentation. Thank you in advance for any help. 1. If I store dates in Pytables, does it mean I could write queries like table.where('date.month == 5')? Is there a common way to pass from python's datetime to pytable's datetime and inversely? Hello Sasha, Pytables times are the actual based off of C time, not Python's date times. This is because they use the HDF5 time types. So unfortunately you can't write queries like the one above. (You'd need to talk to numexpr about getting that kind of query implemented ~_~.) Instead I would suggest that you store your times as Float64Atoms and Float64Cols and then use arithmetic to figure out the query: table.where((x / 3600 / 24)%12 == 5) This is not perfect... 2. I have several variables stored in the same file in a separate table for each variable. And I use separate columns year, month, day, hour, minute, second - to mark the time for a record (the records are not necessarily ordered in time) and this is for each variable. I was thinking to put all the variables in the same table and put missing values for the variables which do not have outputs for a given time step. Is it possible to put None as a default value into a table (so I could easily filter dummy rows). It is not possible to use None since that is a Python object of a different type than the other integers you are trying to stick in the column. I would suggest that you use values with no actual meaning. If you are using normal ints you can use -1 to represent missing values. If you are using unsigned ints you have to pick other values, like 13 for month on the Julian calendar. But then again the data comes in chunks, does this mean I would have to check if a row with the same date already exist for a different variable? No you wouldn't you can store the same data multiple times in different rows. I don't really like the ideas in 2, which are intended to save space, but maybe all I need is a good compression level? Can somebody advise me on this? Compression would definitely help here since the date numbers are all fairly similar. Probably even a compression level of 1 would work. Keep in mind that sometime using compression actually speeds things up (see the starving CPU problem). You might just need to experiment with a few different compression level to see how things go. 0, 1, 5, 9 gives you a good spread. Be Well Anthony Cheers -- Oleksandr (Sasha) Huziy -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk