gnodet commented on code in PR #12147:
URL: https://github.com/apache/maven/pull/12147#discussion_r3293943232


##########
impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultModelXmlFactory.java:
##########
@@ -88,10 +97,35 @@ private Model doRead(XmlReaderRequest request) throws 
XmlReaderException {
             throw new IllegalArgumentException("path, url, reader or 
inputStream must be non null");
         }
         try {
+            String modelId = request.getModelId();
+            String location = request.getLocation();
+
+            if (modelId == null) {
+                if (inputStream != null) {
+                    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+                    inputStream.transferTo(baos);
+                    byte[] buf = baos.toByteArray();
+                    modelId = extractModelId(new ByteArrayInputStream(buf));

Review Comment:
   When `modelId` is null and the input comes from a stream/reader, the entire 
content is buffered into memory (`ByteArrayOutputStream` / `CharArrayWriter`) 
just to extract the GAV. For very large POMs this is fine, but there's a 
subtlety: if the stream/reader was already partially consumed before reaching 
here (unlikely but defensive), this would silently lose data.
   
   Also, consider whether the `extractModelId` parse could share a factory with 
the main `MavenStaxReader` parse to avoid creating two `XMLStreamReader` 
instances for every POM read.



##########
impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultModelXmlFactory.java:
##########
@@ -115,6 +149,130 @@ private Model doRead(XmlReaderRequest request) throws 
XmlReaderException {
         }
     }
 
+    static class InputFactoryHolder {
+        static final XMLInputFactory XML_INPUT_FACTORY;
+
+        static {
+            XMLInputFactory factory = XMLInputFactory.newFactory();
+            
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, true);
+            factory.setProperty(XMLInputFactory.IS_COALESCING, true);
+            XML_INPUT_FACTORY = factory;
+        }
+    }
+
+    private String extractModelId(InputStream inputStream) {
+        try {
+            XMLStreamReader reader = 
InputFactoryHolder.XML_INPUT_FACTORY.createXMLStreamReader(inputStream);
+            try {
+                return extractModelId(reader);
+            } finally {
+                reader.close();
+            }
+        } catch (Exception e) {
+            return null;
+        }
+    }
+
+    private String extractModelId(Reader reader) {
+        try {
+            XMLStreamReader xmlReader = 
InputFactoryHolder.XML_INPUT_FACTORY.createXMLStreamReader(reader);
+            try {
+                return extractModelId(xmlReader);
+            } finally {
+                xmlReader.close();
+            }
+        } catch (Exception e) {
+            return null;
+        }
+    }
+
+    private static String extractModelId(XMLStreamReader reader) throws 
XMLStreamException {
+        String groupId = null;
+        String artifactId = null;
+        String version = null;
+        String parentGroupId = null;
+        String parentVersion = null;
+
+        boolean inProject = false;
+        boolean inParent = false;
+        String currentElement = null;
+
+        while (reader.hasNext()) {
+            int event = reader.next();
+
+            if (event == XMLStreamConstants.START_ELEMENT) {
+                String localName = reader.getLocalName();
+
+                if ("project".equals(localName)) {
+                    inProject = true;
+                } else if ("parent".equals(localName) && inProject) {
+                    inParent = true;
+                } else if (inProject
+                        && ("groupId".equals(localName)
+                                || "artifactId".equals(localName)
+                                || "version".equals(localName))) {
+                    currentElement = localName;
+                }
+            } else if (event == XMLStreamConstants.END_ELEMENT) {
+                String localName = reader.getLocalName();
+
+                if ("parent".equals(localName)) {
+                    inParent = false;
+                } else if ("project".equals(localName)) {
+                    break;
+                }
+                currentElement = null;
+            } else if (event == XMLStreamConstants.CHARACTERS && 
currentElement != null) {
+                String text = reader.getText().trim();
+                if (!text.isEmpty()) {
+                    if (inParent) {
+                        switch (currentElement) {
+                            case "groupId":
+                                parentGroupId = text;
+                                break;
+                            case "version":
+                                parentVersion = text;
+                                break;
+                            default:
+                                break;
+                        }
+                    } else {
+                        switch (currentElement) {
+                            case "groupId":
+                                groupId = text;
+                                break;

Review Comment:
   The early-exit optimization `if (artifactId != null && groupId != null && 
version != null)` only checks direct project-level coordinates. If groupId is 
inherited from the parent and appears *after* artifactId+version in the POM, 
the parser will still continue through the entire `<project>` element. This is 
correct behavior (the fallback to parent values happens after the loop), but 
the early-exit condition could also include `parentGroupId`/`parentVersion` to 
exit sooner in the common case where parent is declared before the child 
elements:
   
   ```suggestion
               if (artifactId != null
                       && (groupId != null || parentGroupId != null)
                       && (version != null || parentVersion != null)) {
                   break;
               }
   ```



##########
impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultModelXmlFactory.java:
##########
@@ -115,6 +149,130 @@ private Model doRead(XmlReaderRequest request) throws 
XmlReaderException {
         }
     }
 
+    static class InputFactoryHolder {

Review Comment:
   The `InputFactoryHolder` lazy-init pattern is good. However, 
`XMLInputFactory.newFactory()` is not guaranteed to be thread-safe across all 
implementations. Since this is a `static final`, it will be shared across 
threads. The properties set here (`IS_REPLACING_ENTITY_REFERENCES`, 
`IS_COALESCING`) are set once at init, so the factory itself is safe — but note 
that some StAX implementations have thread-safety issues when *creating 
readers* from a shared factory. The Woodstox implementation used by Maven 
should be fine, but it's worth a comment.
   
   Also, note that `IS_COALESCING = true` may have a minor performance cost 
since it forces the parser to merge adjacent character events. For the limited 
extraction done here (just groupId/artifactId/version), it's probably 
negligible.



##########
impl/maven-impl/src/test/java/org/apache/maven/impl/DefaultModelXmlFactoryTest.java:
##########
@@ -122,6 +124,51 @@ void testMalformedModelVersion() throws Exception {
         assertEquals("invalid.version", model.getModelVersion());
     }
 
+    @Test
+    void testExtractsModelIdIntoInputSource() throws Exception {
+        String xml = """
+                <project xmlns="http://maven.apache.org/POM/4.0.0";>
+                  <modelVersion>4.0.0</modelVersion>
+                  <groupId>org.example</groupId>
+                  <artifactId>child</artifactId>
+                  <version>1</version>
+                </project>""";
+
+        Model model = factory.read(XmlReaderRequest.builder()
+                .reader(new StringReader(xml))
+                .location("pom.xml")
+                .strict(true)
+                .build());
+
+        InputSource source = model.getLocation("").getSource();
+        assertNotNull(source);
+        assertEquals("org.example:child:1", source.getModelId());
+    }
+

Review Comment:
   Good test coverage for the happy path. Consider also adding a test for 
InputStream-based reading (not just Reader), since the buffering paths differ 
(`ByteArrayOutputStream` vs `CharArrayWriter`). Also, a test where no modelId 
can be extracted (e.g., a POM missing `<artifactId>`) would verify the 
null-fallback behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to