Not sure about the conversion, but regarding self_destruct: the problem is that it only provides memory savings in limited situations that are hard to figure out from the outside. When enabled, PyArrow will always discard the reference to the array after conversion, and if there are no other references, that would free the array. But different arrays may be backed by the same underlying memory buffer (this is generally true for IPC and Flight, for example), so freeing the array won't actually free any memory since the buffer is still alive. It would only save memory if you ensure each array is actually backed by its own memory allocations (which right would generally mean copying data up front!).
On Thu, Aug 31, 2023, at 11:11, Li Jin wrote: > Hi, > > I am working on some code where I have a list of pa.Arrays and I am > creating a pandas.DataFrame from it. I also want to set the index of the > pd.DataFrame to be the first Array in the list. > > Currently I am doing sth like: > " > df = pa.Table.from_arrays(arrs, names=input_names).to_pandas() > df.set_index(input_names[0], inplace=True) > " > > I am curious if this is the best I can do? Also I wonder if it is still > worthwhile to use the "self_destruct=True" option here (I noticed it has > been EXPERIMENTAL for a long time) > > Thanks! > Li