[
https://issues.apache.org/jira/browse/ARROW-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349258#comment-16349258
]
ASF GitHub Bot commented on ARROW-1757:
---------------------------------------
wesm commented on a change in pull request #1535: ARROW-1757: [C++] Add
DictionaryArray::FromArrays alternate ctor that can check or sanitized
"untrusted" indices
URL: https://github.com/apache/arrow/pull/1535#discussion_r165484561
##########
File path: cpp/src/arrow/array.cc
##########
@@ -492,11 +492,52 @@ DictionaryArray::DictionaryArray(const
std::shared_ptr<DataType>& type,
SetData(data);
}
+Status DictionaryArray::FromArrays(const std::shared_ptr<DataType>& type,
+ const std::shared_ptr<Array>& indices,
+ std::shared_ptr<Array>* out) {
+ if (indices->length() == 0) {
+ return Status::Invalid("Dictionary indices must have non-zero length");
+ }
+
+ DCHECK_EQ(type->id(), Type::DICTIONARY);
+ std::shared_ptr<DictionaryType> dict =
std::static_pointer_cast<DictionaryType>(type);
+ DCHECK_EQ(indices->type_id(), dict->index_type()->id());
+
+ int64_t range = dict->dictionary()->length();
+ bool is_valid = true;
+
+ switch (indices->type_id()) {
+ case Type::INT8:
+ is_valid = SanityCheck<Int8Type>(indices, range);
+ break;
+ case Type::INT16:
+ is_valid = SanityCheck<Int16Type>(indices, range);
+ break;
+ case Type::INT32:
+ is_valid = SanityCheck<Int32Type>(indices, range);
+ break;
+ case Type::INT64:
+ is_valid = SanityCheck<Int64Type>(indices, range);
+ break;
+ default:
+ std::stringstream ss;
+ ss << "Categorical index type not supported: "
+ << indices->type()->ToString();
+ return Status::NotImplemented(ss.str());
+ }
+
+ if (!is_valid) {
+ return Status::Invalid("Invalid dictionary indices");
Review comment:
More specific error message here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] Add DictionaryArray::FromArrays alternate ctor that can check or
> sanitized "untrusted" indices
> ----------------------------------------------------------------------------------------------------
>
> Key: ARROW-1757
> URL: https://issues.apache.org/jira/browse/ARROW-1757
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Panchen Xue
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Related to ARROW-1658. This is related to the offset sanitization in
> {{ListArray::FromArrays}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)