[
https://issues.apache.org/jira/browse/ARROW-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345261#comment-16345261
]
ASF GitHub Bot commented on ARROW-1705:
---------------------------------------
wesm commented on a change in pull request #1530: ARROW-1705: [Python] allow
building array from dicts
URL: https://github.com/apache/arrow/pull/1530#discussion_r164783172
##########
File path: cpp/src/arrow/python/builtin_convert.cc
##########
@@ -722,25 +736,60 @@ class ListConverter : public
TypedConverterVisitor<ListBuilder, ListConverter> {
public:
Status Init(ArrayBuilder* builder) override;
- Status AppendItem(const OwnedRef& item) {
+ Status AppendItem(PyObject* obj) override {
RETURN_NOT_OK(typed_builder_->Append());
- PyObject* item_obj = item.obj();
- const auto list_size = static_cast<int64_t>(PySequence_Size(item_obj));
- return value_converter_->AppendData(item_obj, list_size);
+ const auto list_size = static_cast<int64_t>(PySequence_Size(obj));
+ return value_converter_->AppendMultiple(obj, list_size);
}
protected:
std::shared_ptr<SeqConverter> value_converter_;
};
+class StructConverter : public TypedConverterVisitor<StructBuilder,
StructConverter> {
+ public:
+ Status Init(ArrayBuilder* builder) override;
+
+ Status AppendItem(PyObject* obj) override {
+ RETURN_NOT_OK(typed_builder_->Append());
+ if (!PyDict_Check(obj)) {
+ return Status::TypeError("dict value expected for struct type");
+ }
+ // NOTE we're ignoring any extraneous dict items
+ for (int i = 0; i < num_fields_; i++) {
+ PyObject* nameobj = PyList_GET_ITEM(field_name_list_.obj(), i);
+ PyObject* valueobj = PyDict_GetItem(obj, nameobj); // borrowed
+ RETURN_IF_PYERROR();
+ RETURN_NOT_OK(value_converters_[i]->AppendSingle(valueobj ? valueobj :
Py_None));
Review comment:
I'm not sure how much of a measurable performance impact it will have (it
would be good to know for future reference), but I made a separate vector of
raw pointers to the internal builders for `RecordBatchBuilder` to avoid the
extra shared_ptr overhead on the inner loop
https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_builder.h#L105
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Create StructArray from sequence of dicts given a known data type
> --------------------------------------------------------------------------
>
> Key: ARROW-1705
> URL: https://issues.apache.org/jira/browse/ARROW-1705
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Wes McKinney
> Assignee: Antoine Pitrou
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> See https://github.com/apache/arrow/issues/1217
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)