Discussion: Native PDX Reader/Writer std::string

Jacob Barrett Tue, 19 Dec 2017 13:36:04 -0800

Java String is an object and as such a member of type String can be null or
allocated in the heap. In C++ std::string is a value and as such a member
of type std::string can never be null and is (usually) stack allocated with
the contents allocated (usually) on the heap. In C/C++ char* is a decayed
char[] and a such a member of type char* can be null or a pointer to an
allocated char[] either on the heap or the stack.


As we move away from char* representations of strings which support null to
std::string that does not we run into an interesting problem to solve. What
to do about null strings in a PDX object? The original PdxReader readString
returned a char*, which could be null or allocated a char[] on the heap and
returned the pointer (usually, though I found some places where it returned
the address to a static constant char[] but without const qualifier, bad).
This version of PdxReader::readString then supports null string values
coming from Java.

Looking at options for converting to std::string we have the following:

*1) std::string PdxReader::readString(const std::string& fieldName)*
Most natural for C++11 but does not support null. A null PDX string value
would be converted into an empty std::string. Upon writing back to PDX the
null value would be replaced with and empty PDX string value. This could
impact the other domain objects derived from this same PDX type if they
treat null and empty strings differently. For example:
class Foo {
std::string naturalStringMember;

~Foo() = default;

void fromData(PdxReader reader) {
  // RVO moved contents
  unnaturalStringMember = readString("naturalStringMember");
  // null value lost
}
};


*2) std::string* PdxReader::readString(const std::string& fieldName)*
The subtle difference there is returning raw pointer to a heap allocated
std::string, which is very very unnatural to C++. The domain object would
either hold the raw pointer or check for null and copy to a std::string
value member. In either case they would have to free the allocated
std::string when done. For example:
class Foo {
std::string* unnaturalStringMember;
std::string naturalStringMember;

~Foo() {
  if (unnaturalStringMember) {
    delete unnaturalStringMember;
  }
}

void fromData(PdxReader reader) {
  unnaturalStringMember = readString("unnaturalStringMember");
  if (auto tmp = readString("naturalStringMember")) {
    // move contents
    naturalStringMember = std::move(*naturalStringMember);
    // delete
    delete naturalStringMember;
  } else {
    // null value lost
  }
}
};
*YUCK!!*

*3) std::unique_ptr<std::string> PdxReader::readString(const std::string&
fieldName)*
This shares some of the similar issues with unnaturally having a heap
allocated std::string but solves the ownership ambiguity and deleting
issues. For example:
class Foo {
std::unique_ptr<std::string> unnaturalStringMember;
std::string naturalStringMember;

~Foo() = default;

void fromData(PdxReader reader) {
  unnaturalStringMember = readString("unnaturalStringMember");
  if (auto tmp = readString("naturalStringMember")) {
    // move contents and delete
    naturalStringMember = std::move(*tmp);
  } else {
    // null value lost
  }
}
};
*Eh!*

4) Given that C++ does not allow the same method signature only differing
in return type we can't have all these methods on the interface unless we
named them differently as well. For example readString and
readNullableString. Providing both puts the power in the implementor but
maybe we should have an opinionated approach.

std::string PdxReader::readString(const std::string& fieldName);
std::unique_ptr<std::string> PdxReader::readNullableString(const
std::string& fieldName);

class Foo {
std::unique_ptr<std::string> nullableStringMember;
std::string naturalStringMember;

~Foo() = default;

void fromData(PdxReader reader) {
  nullableStringMember = readNullabletring("nullableStringMember");
  naturalStringMember = readString("uniqueStringMember");
}
};


5) Alternatively one could use out parameters, though we have been trying
to get away from out parameters since they don't match the "functional"
preference.
void PdxReader::readString(const std::string& fieldName
                           std::string& value);
void PdxReader::readString(const std::string& fieldName,
                           std::string* value);
void PdxReader::readString(const std::string& fieldName,
                           std::unique_ptr<std::string>& value);

class Foo {
std::string* rawStringMember;
std::unique_ptr<std::string> uniqueStringMember;
std::string naturalStringMember;

~Foo() = default;

void fromData(PdxReader reader) {
  readString("rawStringMember", rawStringMember);
  readString("uniqueStringMember", uniqueStringMember);
  readString("naturalStringMember", naturalStringMember);
}
};


Does supporting null really buy us anything worth the pain an anguish of
not looking very C++? My preference is probably towards non-null by
providing a "nullable" version of the method for backwards compatibility
(option 4).

Thoughts?

What about arrays of strings? There were supported as a decayed char[][] as
char**. Defined as char** PdxReader::readStringArray(const std::string&
fieldName, int32_t& len). Where len was an out param indicating the size of
the array. The function could return null since Java arrays are objects and
therefor can be null. Each element of the array could also be null since
String in Java is nullable. To cleanup one had to iterate the array and
delete each element if not null then delete the array. Should we consider
putting this into a std::vector<std::string>. Again we lose the null
support but maybe it is worth it. Alternatively we could support a nullable
version as std::unique_ptr<vector<std::unique_ptr<std::string>>>.

More thoughts?

-Jake

Discussion: Native PDX Reader/Writer std::string

Reply via email to