[
https://issues.apache.org/jira/browse/OPENNLP-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733894#comment-17733894
]
ASF GitHub Bot commented on OPENNLP-1174:
-----------------------------------------
rzo1 commented on code in PR #541:
URL: https://github.com/apache/opennlp/pull/541#discussion_r1233251530
##########
opennlp-tools/src/main/java/opennlp/tools/util/featuregen/GeneratorFactory.java:
##########
@@ -95,25 +87,235 @@
public class GeneratorFactory {
/**
- * The {@link XmlFeatureGeneratorFactory} is responsible to construct
- * an {@link AdaptiveFeatureGenerator} from an given XML {@link Element}
- * which contains all necessary configuration if any.
+ * Creates an {@link AdaptiveFeatureGenerator} from an provided XML
descriptor.
+ * <p>
+ * Usually this XML descriptor contains a set of nested feature generators
+ * which are then used to generate the features by one of the opennlp
+ * components.
+ *
+ * @param xmlDescriptorIn the {@link InputStream} from which the descriptor
+ * is read, the stream remains open and must be
closed by the caller.
+ * @param resourceManager the resource manager which is used to resolve
resources
+ * referenced by a key in the descriptor
+ * @return created feature generators
+ * @throws IOException if an error occurs during reading from the descriptor
+ * {@link InputStream}
*/
- @Deprecated // TODO: (OPENNLP-1174) just remove when back-compat is no
longer needed
- interface XmlFeatureGeneratorFactory {
+ public static AdaptiveFeatureGenerator create(InputStream xmlDescriptorIn,
+
FeatureGeneratorResourceProvider resourceManager)
+ throws IOException {
- /**
- * Creates an {@link AdaptiveFeatureGenerator} from a the describing
- * XML element.
- *
- * @param generatorElement the element which contains the configuration
- * @param resourceManager the resource manager which could be used
- * to access referenced resources
- *
- * @return the configured {@link AdaptiveFeatureGenerator}
- */
- AdaptiveFeatureGenerator create(Element generatorElement,
- FeatureGeneratorResourceProvider resourceManager) throws
InvalidFormatException;
+ final org.w3c.dom.Document xmlDescriptorDOM = createDOM(xmlDescriptorIn);
+ final Element generatorElement = xmlDescriptorDOM.getDocumentElement();
+
+ return createGenerator(generatorElement, resourceManager);
+ }
+
+ /**
+ * Creates a {@link AdaptiveFeatureGenerator} for the provided element.
+ * To accomplish this it looks up the corresponding factory by the
+ * element tag name. The factory is then responsible for the creation
+ * of the generator from the element.
+ *
+ * @param generatorElement must not be {@code null}
+ * @param resourceManager may be {@code null}
+ * @return a {@link AdaptiveFeatureGenerator}
+ */
+ private static AdaptiveFeatureGenerator buildGenerator(Element
generatorElement,
+
FeatureGeneratorResourceProvider resourceManager)
+ throws InvalidFormatException {
+
+ if (generatorElement == null) {
+ throw new IllegalArgumentException("generatorElement must not be NULL");
+ }
+
+ final String className = generatorElement.getAttribute("class");
+ if (className.isBlank()) {
+ throw new InvalidFormatException("generator must have class attribute");
+ } else {
+ try {
+ final Class<?> factoryClass = Class.forName(className);
+ try {
+ final Constructor<?> constructor = factoryClass.getConstructor();
+ final AbstractXmlFeatureGeneratorFactory factory =
+ (AbstractXmlFeatureGeneratorFactory) constructor.newInstance();
+ factory.init(generatorElement, resourceManager);
+ return factory.create();
+ } catch (NoSuchMethodException | InvocationTargetException |
InstantiationException
+ | IllegalAccessException e) {
+ throw new RuntimeException(e);
+ }
+ } catch (ClassNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ }
+
+ /**
+ * Creates a {@link AdaptiveFeatureGenerator} for the provided element.
+ * To accomplish this it looks up the corresponding factory by the
+ * element tag name. The factory is then responsible for the creation
+ * of the generator from the element.
+ *
+ * @param generatorElement must not be {@code null}
+ * @param resourceManager may be {@code null}
+ * @return a {@link AdaptiveFeatureGenerator}
+ */
+ private static AdaptiveFeatureGenerator createGenerator(Element
generatorElement,
+
FeatureGeneratorResourceProvider resourceManager)
+ throws InvalidFormatException {
+
+ if (generatorElement == null) {
+ throw new IllegalArgumentException("generatorElement must not be NULL");
+ }
+
+ final String elementName = generatorElement.getTagName();
+
+ // check it is new format?
+ if ("featureGenerators".equals(elementName)) {
+
+ final List<AdaptiveFeatureGenerator> generators = new ArrayList<>();
+ final NodeList childNodes = generatorElement.getChildNodes();
+ for (int i = 0; i < childNodes.getLength(); i++) {
+ final Node childNode = childNodes.item(i);
+ if (childNode instanceof Element elem) {
+ final String type = elem.getTagName();
+ if ("generator".equals(type)) {
+ generators.add(buildGenerator(elem, resourceManager));
+ } else {
+ throw new InvalidFormatException("Unexpected element: " +
elementName);
+ }
+ }
+ }
+
+ AdaptiveFeatureGenerator featureGenerator;
+ if (generators.size() == 1) {
+ featureGenerator = generators.get(0);
+ } else if (generators.size() > 1) {
+ featureGenerator = new AggregatedFeatureGenerator(generators.toArray(
+ new AdaptiveFeatureGenerator[0]));
+ } else {
+ throw new InvalidFormatException("featureGenerators must have one or
more generators");
+ }
+
+ // disallow manually specifying CachedFeatureGenerator
+ if (featureGenerator instanceof CachedFeatureGenerator) {
+ throw new InvalidFormatException("CachedFeatureGeneratorFactory cannot
be specified manually." +
+ "Use cache=\"true\" attribute in featureGenerators element
instead.");
+ }
+
+ // check cache usage
+ if (Boolean.parseBoolean(generatorElement.getAttribute("cache"))) {
+ return new CachedFeatureGenerator(featureGenerator);
+ } else {
+ return featureGenerator;
+ }
+ } else {
+ throw new IllegalArgumentException(
+ "[OPENNLP-1174] - Classic configuration format is no longer
supported!");
+ }
+ }
+
+ private static org.w3c.dom.Document createDOM(InputStream xmlDescriptorIn)
+ throws IOException {
+
+ final DocumentBuilder documentBuilder = XmlUtil.createDocumentBuilder();
+
+ org.w3c.dom.Document xmlDescriptorDOM;
+
+ try {
+ xmlDescriptorDOM = documentBuilder.parse(xmlDescriptorIn);
+ } catch (SAXException e) {
+ throw new InvalidFormatException("Descriptor is not valid XML!", e);
+ }
+ return xmlDescriptorDOM;
+ }
+
+ public static Map<String, ArtifactSerializer<?>>
extractArtifactSerializerMappings(
+ InputStream xmlDescriptorIn) throws IOException {
+
+ final org.w3c.dom.Document xmlDescriptorDOM = createDOM(xmlDescriptorIn);
+ final Element element = xmlDescriptorDOM.getDocumentElement();
+
+ final String elementName = element.getTagName();
+
+ // check it is new format?
+ if ("featureGenerators".equals(elementName)) {
+ return addMappingsFromXmlChildren(element.getChildNodes(), new
HashMap<>());
+ } else {
+ throw new IllegalArgumentException(
+ "[OPENNLP-1174] - Classic configuration format is no longer
supported!");
+ }
+ }
+
+ private static void extractArtifactSerializerMappings(
+ Map<String, ArtifactSerializer<?>> mapping, Element element) {
+ final String className = element.getAttribute("class");
+ if (!className.isBlank()) {
+ try {
+ final Class<?> factoryClass = Class.forName(className);
+ try {
+ final Constructor<?> constructor = factoryClass.getConstructor();
+ final AbstractXmlFeatureGeneratorFactory factory =
+ (AbstractXmlFeatureGeneratorFactory) constructor.newInstance();
+ factory.init(element, null);
+ final Map<String, ArtifactSerializer<?>> map =
factory.getArtifactSerializerMapping();
+ if (map != null) {
+ mapping.putAll(map);
+ }
+ } catch (NoSuchMethodException | InvocationTargetException |
InstantiationException
+ | IllegalAccessException e) {
+ throw new RuntimeException(e);
+ } catch (InvalidFormatException ignored) {
+ }
+ } catch (ClassNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ addMappingsFromXmlChildren(element.getChildNodes(), mapping);
+ }
+
+ private static Map<String, ArtifactSerializer<?>> addMappingsFromXmlChildren(
+ final NodeList nodes, final Map<String, ArtifactSerializer<?>> mapping) {
+ for (int i = 0; i < nodes.getLength(); i++) {
+ if (nodes.item(i) instanceof Element childElem) {
+ if ("generator".equals(childElem.getTagName())) {
+ extractArtifactSerializerMappings(mapping, childElem);
+ }
+ }
+ }
+ return mapping;
+ }
+
+ /**
+ * Provides a list with all the elements in the xml feature descriptor.
+ *
+ * @param xmlDescriptorIn the xml feature descriptor
+ * @return a list containing all elements
+ * @throws IOException if inputstream cannot be open
+ * @throws InvalidFormatException if xml is not well-formed
+ */
+ public static List<Element> getDescriptorElements(InputStream
xmlDescriptorIn)
+ throws IOException {
+
+ final List<Element> elements = new ArrayList<>();
+ final org.w3c.dom.Document xmlDescriptorDOM = createDOM(xmlDescriptorIn);
+ final XPath xPath = XPathFactory.newInstance().newXPath();
+ NodeList allElements;
+ try {
+ final XPathExpression exp = xPath.compile("//*");
+ allElements = (NodeList)
exp.evaluate(xmlDescriptorDOM.getDocumentElement(), XPathConstants.NODESET);
+ } catch (XPathExpressionException e) {
+ throw new IllegalStateException("The hard coded XPath expression should
always be valid!");
Review Comment:
Should be `InvalidFormatException` here, which is already documented in the
JavaDoc but code was misbehaving. Changed to `InvalidFormatException`
> remove classic format support in feature generator XML config when it is no
> longer needed
> -----------------------------------------------------------------------------------------
>
> Key: OPENNLP-1174
> URL: https://issues.apache.org/jira/browse/OPENNLP-1174
> Project: OpenNLP
> Issue Type: Planned Work
> Reporter: Koji Sekiguchi
> Assignee: Richard Zowalla
> Priority: Minor
>
> I put many "TODO" marks in the patch for OPENNLP-1154, e.g.:
> {code:java}
> @Deprecated // TODO: just remove when back-compat is no longer needed
> static void register(Map<String, GeneratorFactory.XmlFeatureGeneratorFactory>
> factoryMap) {
> factoryMap.put("generators", new AggregatedFeatureGeneratorFactory());
> }
> {code}
> Before merging it, I'd like to create this ticket to get ticket number and
> add the number in the patch for OPENNLP-1154 so that we can easily find
> unnecessary codes when we want to remove in the future..
--
This message was sent by Atlassian Jira
(v8.20.10#820010)