tuxji commented on a change in pull request #768:
URL: https://github.com/apache/daffodil/pull/768#discussion_r839634663



##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/Base4.scala
##########
@@ -31,10 +31,46 @@ object BitsCharsetBase4LSBF extends {
   override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
 } with BitsCharsetNonByteSize
 
+final class BitsCharsetBase4LSBFDefinition
+  extends BitsCharsetDefinition {
+
+  override def name() = "X-DFDL-BASE4-LSBF"
+
+  override def newFactory() = {
+    new BitsCharsetBase4LSBFFactory()
+  }
+}
+
+class BitsCharsetBase4LSBFFactory()

Review comment:
       Missed a final here :).

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetDefinition.scala
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+/**
+ * Must be implemented by all charsets.
+ *

Review comment:
       Please remove these lines.  I'd meant to ask you to move the first 
sentence after the second sentence, and you've done that so these lines are 
redundant.

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetFactory.scala
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.processors.charset
+
+/**
+ * Factory for a Daffodil charset.
+ *
+ * This is the serialized object which is saved as part of a processor.
+ * It constructs a charset at runtime when newInstance() is called.
+ *
+ * This must be implemented as part of implementing a Daffodil charset.
+ */
+abstract class BitsCharsetFactory
+  extends Serializable {
+
+  def newInstance(): BitsCharset
+}

Review comment:
       Just a thought.  In Scala, we don't have to put each trait/object/class 
in its own file like we do in Java.  You could define BitsCharsetDefinition, 
BitsCharsetDefinitionRegistry, and BitsCharsetFactory in the same file 
(BitsCharsetDefinition.scala) and that'll mean the extra tests logically belong 
in TestBitsCharsetDefinition.scala (no need to add another test class).  Also, 
you can remove newInstance()'s parentheses.

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetDefinitionRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object BitsCharsetDefinitionRegistry {
+
+  private lazy val bitsCharsetDefinitionMap: Map[String, 
BitsCharsetDefinition] =
+    
SimpleNamedServiceLoader.loadClass[BitsCharsetDefinition](classOf[BitsCharsetDefinition])
+
+  /**
+   * Given name, finds the factory for the transformer. SDE otherwise.
+   *
+   * The state is passed in order to provide diagnostic context if not found.
+   */
+  def find(name: String, context: ThrowsSDE): BitsCharsetDefinition = {
+    val optDefinition: Option[BitsCharsetDefinition] = 
bitsCharsetDefinitionMap.get(name)
+    if (optDefinition.isEmpty) {
+      val choices = bitsCharsetDefinitionMap.keySet.mkString(", ")
+      context.SDE("The encoding '%s' was not found. Available choices are: 
%s", name, choices)
+    } else {
+      optDefinition.get
+    }
+  }
+
+  def find(name: String): BitsCharsetDefinition = {
+    val optDefinition: Option[BitsCharsetDefinition] = 
bitsCharsetDefinitionMap.get(name)
+    if (optDefinition.isEmpty) {
+      val choices = bitsCharsetDefinitionMap.keySet.mkString(", ")
+      SDE("The encoding '%s' was not found. Available choices are: %s", name, 
choices)
+    } else {
+      optDefinition.get
+    }
+  }
+
+  def SDE(id: String, args: Any*): Nothing = {
+    throw new Exception(id.format(args: _*))
+  }
+
+  def supportedEncodingsString = {bitsCharsetDefinitionMap.keySet.mkString(", 
")}

Review comment:
       When you have a one-liner, you don't need braces, and it would be nice 
to add a test to 
[TestBitsCharsetDefinition.scala](https://github.com/apache/daffodil/pull/768/files#diff-c9938dc11841543a27f49c21e58d1a2d5026854c2ca9a026aed6875a9a70bd9d)
 which calls this function and verifies the existence of an encoding's name 
with a comma within the returned string to check the string has the right 
content.  While you're at it, let's also add two negative tests which call each 
find method with an impossible to find encoding in order to put the error paths 
under test coverage.

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetDefinitionRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object BitsCharsetDefinitionRegistry {

Review comment:
       It would be wonderful to add a comment here briefly explaining what this 
object is used for, something like "Finds all pluggable BitCharsets and makes 
them available to Daffodil."
   
   P.S. In later comments, I'm now suggesting that you chop `Registry` off 
`BitsCharsetDefinitionRegistry`'s name and move it to 
BitsCharsetDefinition.scala.

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetDefinition.scala
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+/**
+ * Must be implemented by all charsets.
+ *
+ * These are the classes which must be dynamically loaded in order to add a 
charset implementation
+ * to Daffodil. All charsets must implement this class
+ *
+ * A saved processor does NOT serialize this class. It calls the newFactory 
method and serializes
+ * the resulting BitsCharsetFactory.
+ */
+abstract class BitsCharsetDefinition {
+
+  def name(): String
+
+  def newFactory(): BitsCharsetFactory

Review comment:
       Since these methods have no side effects and no arguments, we can remove 
the parentheses.

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/Hex.scala
##########
@@ -31,6 +31,24 @@ object BitsCharsetHexLSBF extends {
   override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
 } with BitsCharsetNonByteSize
 
+final class BitsCharsetHexLSBFDefinition
+  extends BitsCharsetDefinition {
+
+  override def name() = "X-DFDL-HEX-LSBF"
+
+  override def newFactory() = {
+    new BitsCharsetHexLSBFFactory()
+  }
+}
+
+class BitsCharsetHexLSBFFactory()

Review comment:
       Another final here :)

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetUtils.scala
##########
@@ -33,11 +33,10 @@ object CharsetUtils {
    * encodings as well as the standard ones.
    */
   def getCharset(name: String): BitsCharset = {
-    val cs = DaffodilCharsetProvider.charsetForName(name)
-    cs
+    BitsCharsetDefinitionRegistry.find(name).newFactory.newInstance

Review comment:
       I just realized something concerning about this change to 
Charset.getCharset.  The scaladoc contract for 
`DaffodilCharsetProvider.charsetForName(name)` promised it would return null if 
the charset couldn't be found.  However, the scaladoc contract for 
`BitsCharsetDefinitionRegistry.find(name)` (well, there isn't one because you 
haven't written it yet, but if you did base it on find(name)'s existing code, 
it would say) promises it will throw a SDE if the charset can't be found.  
We've just changed `Charset.getCharset(name)`'s behavior on its error path, but 
there are two places in Daffodil code which still check if getCharset returns 
null on its error path:
   
   ```
   
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/EvEncoding.scala
   98:      tci.schemaDefinitionError("Unsupported encoding: %s. Supported 
encodings: %s", encString, CharsetUtils.supportedEncodingsString)
   
   daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
   2115:      case null => Assert.usageError("Unsupported encoding: " + 
encodingName + ". Supported encodings: " + 
CharsetUtils.supportedEncodingsString)
   ```
   
   Both of these error paths throw exceptions as well, so you can simply delete 
their code since they're now unnecessary.  (P.S. I see you've already deleted 
the error path code from EvEncoding.scala but TDMLRunner.scala still has error 
path code.)  I think it's better to guarantee by contract that getCharset will 
return a Charset or throw a SDE since too many existing places just call 
`CharsetUtils.getCharset(name)` without checking for a null - these places are 
sure to throw a NPE on the error path anyway.
   
   However, I think we want to change 
`BitsCharsetDefinitionRegistry.find(name)` so it returns 
Option[BitsCharsetDefinition] instead of throwing a SDE and throw the SDE here 
from getCharset instead.  Someone may have an use case that requires them to 
programmatically look up an user-specified encoding and use that charset if it 
can be found or else fall back to a default charset without throwing a SDE.  Do 
you agree, @mbeckerle and @stevedlawrence?
   
   P.S. If we're going to expect someone to call 
`BitsCharsetDefinitionRegistry.find(name)` directly, I think we ought to chop 
`Registry` off the object's name; don't you think 
`BitsCharsetDefinition.find(name)` is a more readable call (especially since it 
will return `Option[BitsCharsetDefinition]`)?

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetDefinitionRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object BitsCharsetDefinitionRegistry {
+
+  private lazy val bitsCharsetDefinitionMap: Map[String, 
BitsCharsetDefinition] =
+    
SimpleNamedServiceLoader.loadClass[BitsCharsetDefinition](classOf[BitsCharsetDefinition])
+
+  /**
+   * Given name, finds the factory for the transformer. SDE otherwise.
+   *
+   * The state is passed in order to provide diagnostic context if not found.
+   */
+  def find(name: String, context: ThrowsSDE): BitsCharsetDefinition = {
+    val optDefinition: Option[BitsCharsetDefinition] = 
bitsCharsetDefinitionMap.get(name)
+    if (optDefinition.isEmpty) {
+      val choices = bitsCharsetDefinitionMap.keySet.mkString(", ")
+      context.SDE("The encoding '%s' was not found. Available choices are: 
%s", name, choices)
+    } else {
+      optDefinition.get
+    }
+  }
+
+  def find(name: String): BitsCharsetDefinition = {
+    val optDefinition: Option[BitsCharsetDefinition] = 
bitsCharsetDefinitionMap.get(name)
+    if (optDefinition.isEmpty) {
+      val choices = bitsCharsetDefinitionMap.keySet.mkString(", ")
+      SDE("The encoding '%s' was not found. Available choices are: %s", name, 
choices)
+    } else {
+      optDefinition.get
+    }
+  }
+
+  def SDE(id: String, args: Any*): Nothing = {

Review comment:
       You'll want to make this helper function private.
   
   P.S.  In later comments, I'm now suggesting that you change `find(name)` to 
return an `Option[BitsCharsetDefinition]` instead of throwing a SDE, move this 
private SDE function to CharsetUtils.scala, and call it in 
`CharsetUtils.getCharset(name)` instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to